Emergent Mind

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

(2403.19495)
Published Mar 28, 2024 in cs.CV and cs.GR

Abstract

The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.

Using monocular depth for 3D Gaussian initialization and aligning representations via flow correspondences for consistency.

Overview

  • The paper discusses enhancing 3D reconstruction from sparse views by using regularized optimization and depth-based initialization with 3D Gaussian Splatting.

  • 3D Gaussian Splatting's performance in sparse input scenarios is improved by imposing explicit 2D image space constraints, leading to higher quality 3D reconstructions.

  • Introduces significant advancements over sparse-view NeRF-based methods through a novel structured Gaussian representation and a comprehensive optimization framework.

  • Demonstrates the method's superior performance in texture and geometry reconstruction from sparse views, suggesting potential for future research in complex scene geometries.

Enhancing Sparse-View 3D Reconstruction with Coherent Gaussian Optimization

Introduction

Advancements in 3D reconstruction from posed images have significantly benefited from the development of explicit structured representations like 3D Gaussian Splatting (3DGS). Despite superior training speed and real-time inference capabilities offered by 3DGS, its performance is compromised in scenarios with extremely sparse input images. The paper introduces a novel approach, regularized optimization alongside depth-based initialization, to address this limitation. This methodology successfully leverages structured Gaussian representations, imposing explicit 2D image space constraints to enhance the representation's coherency, thus significantly improving the quality of 3D reconstruction from sparse views.

3D Gaussian Splatting and Sparse-View Challenges

3DGS has emerged as a promising technique for 3D scene representation, outpacing predecessors in speed and quality of reconstruction. However, its performance deteriorates with sparse inputs, creating challenges in accurately depicting the scene without overfitting. The paper critically analyses these challenges and contrasts 3DGS with Neural Radiance Field (NeRF)-based methods that often struggle under sparse-view conditions due to insufficient regularization.

Novel Contributions

The paper's foremost contributions are threefold:

  • A structured Gaussian representation for 3D reconstruction from sparse inputs is presented, introducing a novel regularized optimization framework that incorporates both single and multiview constraints.
  • The introduction of a depth-based initialization method for 3D Gaussians utilizes monocular depth estimates, significantly enhancing the starting point for optimized reconstruction.
  • It demonstrates superior performance over state-of-the-art sparse-view NeRF-based approaches across various scenes, validating the proposed method's effectiveness through rigorous evaluation and comparison.

Technique Overview

The core technique involves assigning a Gaussian to each pixel of input images, imposing 2D space constraints to regulate the Gaussians' coherence during optimization. Key innovations include utilizing an implicit decoder for enforcing single-view constraints and applying total variation loss for multiview coherence. Moreover, a flow-based loss function further refines the optimization, ensuring positional fidelity across views. Adept initialization of 3D Gaussians using monocular depth predictions is critical, addressing the variability and inconsistency challenges inherent in depth estimates from sparse inputs.

Evaluation and Findings

Comprehensive evaluations underscore the proposed method's efficacy in rendering high-quality synthesized views with remarkable improvements in texture and geometry reconstruction. The approach notably outperforms existing NeRF-based and 3DGS methodologies in sparse-view scenarios. Furthermore, the method exhibits a unique advantage by not reconstructing occluded regions, enabling the application of inpainting techniques for realistic detail hallucination.

Implications and Future Directions

The introduction of coherent constraints in 3D Gaussian splatting sets a new benchmark in sparse-view 3D reconstruction. This work not only addresses the limitations of existing techniques but also opens avenues for future research, particularly in enhancing the reconstruction quality and efficiency of scene representation from minimal inputs. Potential future work could explore the extension of this methodology to more complex scene geometries and textures, including transparent and reflective surfaces, further broadening its applicability.

In conclusion, this paper presents a significant step forward in sparse-view 3D reconstruction, offering a robust framework that marries depth-based initialization with regularized optimization for coherent 3D Gaussian splatting. It promises to elevate the standards of 3D scene representation quality and efficiency, paving the way for more detailed and realistic virtual and augmented reality experiences.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.