Emergent Mind

Abstract

3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times. However, its performance on scenes commonly seen in indoor datasets is poor due to the lack of geometric constraints during optimization. We extend 3D Gaussian splatting with depth and normal cues to tackle challenging indoor datasets and showcase techniques for efficient mesh extraction, an important downstream application. Specifically, we regularize the optimization procedure with depth information, enforce local smoothness of nearby Gaussians, and use the geometry of the 3D Gaussians supervised by normal cues to achieve better alignment with the true scene geometry. We improve depth estimation and novel view synthesis results over baselines and show how this simple yet effective regularization technique can be used to directly extract meshes from the Gaussian representation yielding more physically accurate reconstructions on indoor scenes. Our code will be released in https://github.com/maturk/dn-splatter.

Enhancing mesh reconstructions using handheld device data and network priors for improved Gaussian splatting.

Overview

  • This paper introduces a depth and normal regularization method to refine 3D Gaussian splatting for more photorealistic and geometrically accurate indoor scene reconstruction.

  • The method incorporates depth information via per-pixel depth estimates and uses a gradient-aware logarithmic depth loss and total variation loss for smoothness, enhancing performance in texture-less or poorly observed areas.

  • It derives normals directly from the 3D Gaussians and uses monocular normal priors for better alignment with real surface boundaries, avoiding additional learnable parameters for normal prediction.

  • Experimental validation shows the method's superiority over state-of-the-art 3D reconstruction techniques, demonstrating notable improvements in photorealism and geometric accuracy across various indoor datasets.

Depth and Normal Supervision Enhancements for 3D Gaussian Splatting and Mesh Reconstruction

Gaussian Splatting with Depth and Normal Priors

The technique of 3D Gaussian splatting (3DGS) represents a compelling approach toward inverse rendering, characterized by the usage of differentiable 3D Gaussian primitives. Although 3DGS boasts real-time rendering capabilities and an interoperable scene representation, it grapples with geometric ambiguities and artifacts due to its lack of 3D and surface constraints during optimization. This study introduces a depth and normal regularization method aimed at refining 3D Gaussian splatting for indoor scene reconstruction. By incorporating depth and smoothness priors and aligning Gaussians with scene geometry through monocular normal cues, the method enhances photorealism and geometric fidelity.

Incorporating Depth Information

The method leverages per-pixel depth estimates, determined by a discrete volume rendering approximation, to enforce geometric constraints. Acknowledging the noise properties of common commercial depth sensors, a gradient-aware logarithmic depth loss, alongside a total variation loss to promote smoothness, is employed. This regularization strategy is informed by depth priors obtained from sensors or inferred through monocular depth estimation networks for datasets without depth data, proving beneficial in reducing ambiguities in texture-less or poorly observed regions of indoor scenes.

Normal Estimation and Regularization

By deriving normals directly from the geometry of 3D Gaussians, the study ensures an adaptive alignment of Gaussian primitives with the real surface boundaries of the scene. This approach eschews additional learnable parameters for normal prediction, favoring a regularization strategy grounded in the geometry of the Gaussians themselves. Monocular normal priors, obtained from off-the-shelf networks, serve as a supervision signal, providing smoother and more geometrically plausible results compared to normals estimated from depth gradients.

Optimization and Mesh Extraction

The optimization loss amalgamates photometric loss with depth and normal regularization losses, striking a balance that faithfully represents scene geometry while minimizing visual artifacts. Extending beyond optimization, the study explores direct mesh extraction from the Gaussian representation via Poisson surface reconstruction. The enhanced depth and normal estimation contribute to more accurate and smoother reconstructions, showcasing the method's superiority in extracting meshable surfaces directly from optimized Gaussian scenes.

Experimental Validation

The effectiveness of the proposed regularization strategy is demonstrated across various indoor datasets. When compared to state-of-the-art methods in 3D reconstruction, including NeRF and SDF-based models, our approach manifests noteworthy improvements in both photorealism and geometric accuracy. Particularly in challenging real-world scenes from the MuSHRoom and ScanNet++ datasets, the method outperforms baseline models in depth estimation and novel view synthesis.

Conclusion and Future Prospects

This study evidences the potential of depth and normal priors in refining the quality of 3D Gaussian splatting for scene reconstruction. By converging towards more realistic depictions of indoor environments, the proposed method sets a promising direction for future developments in inverse rendering. The adaptation to sparser or more challenging data captures, alongside the exploration of more sophisticated mesh extraction techniques, are identified as pivotal avenues for further research in the domain of 3D computer vision and graphics.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.