WildGaussians: 3D Gaussian Splatting in the Wild (2407.08447v2)

Published 11 Jul 2024 in cs.CV

Abstract: While the field of 3D scene reconstruction is dominated by NeRFs due to their photorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged, offering similar quality with real-time rendering speeds. However, both methods primarily excel with well-controlled 3D scenes, while in-the-wild data - characterized by occlusions, dynamic objects, and varying illumination - remains challenging. NeRFs can adapt to such conditions easily through per-image embedding vectors, but 3DGS struggles due to its explicit representation and lack of shared parameters. To address this, we introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS. By leveraging robust DINO features and integrating an appearance modeling module within 3DGS, our method achieves state-of-the-art results. We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data, all within a simple architectural framework.

Citations (13)

View on Semantic Scholar

Summary

The paper introduces a novel extension of 3D Gaussian Splatting with trainable appearance and uncertainty embeddings to handle dynamic, unconstrained scenes.
It employs an MLP for adaptive affine color transformations and DINO-based uncertainty prediction, ensuring high fidelity under varying conditions.
Empirical evaluations on NeRF On-the-go and Photo Tourism datasets show superior PSNR, SSIM, and up to 400x faster rendering compared to state-of-the-art methods.

WildGaussians: 3D Gaussian Splatting in the Wild

"WildGaussians: 3D Gaussian Splatting in the Wild" presents a noteworthy advancement in the domain of 3D scene reconstruction, addressing the challenges associated with unconstrained, in-the-wild data. The authors propose WildGaussians, a novel method that extends 3D Gaussian Splatting (3DGS) to handle dynamic scenes with appearance and illumination changes effectively.

Introduction and Core Contributions

3D scene reconstruction methodologies, particularly Neural Radiance Fields (NeRFs), have achieved acclaim for their photorealistic outputs. However, such models tend to falter in real-world settings characterized by occlusions, dynamic objects, and varying illumination. While 3DGS offers solutions for real-time rendering with a comparable quality to NeRFs, it too faces limitations when applied to uncontrolled environments.

WildGaussians introduces two major enhancements to 3DGS:

Appearance Modeling:
- Enhanced with trainable per-image and per-Gaussian embeddings, allowing the method to adapt to changes in scene appearance such as varying illumination or weather conditions.
- Utilizes an MLP to predict affine transformations in color space, effectively enabling the model to handle per-image appearance variations.
Uncertainty Modeling:
- Integrated with DINO-based uncertainty prediction to manage occlusions and transient objects, leveraging the robust feature extraction capabilities of DINOv2.

These enhancements are designed to preserve the real-time rendering capabilities of 3DGS while improving its robustness to real-world data inconsistencies.

Key Methodologies

Appearance Modeling

The approach incorporates trainable per-image and per-Gaussian embeddings to handle varied appearances across different viewpoints and conditions. The embeddings inform an MLP that predicts affine transformations, adjusting the colors of each Gaussian to match the target appearance. This mechanism proves beneficial in ensuring the renderings maintain high fidelity regardless of substantial changes in appearance.

Uncertainty Modeling

To mitigate the influence of transient objects, WildGaussians integrates an uncertainty predictor based on DINOv2 features. The model assigns uncertainties to each pixel using a learnable affine transformation of these features. This setup ensures that dynamic occlusions do not adversely affect the training process, thus maintaining the integrity of static scene components.

Empirical Evaluation

The proposed method was evaluated on two diverse datasets: NeRF On-the-go and Photo Tourism.

NeRF On-the-go Dataset: Contains indoor and outdoor sequences with varying occlusion densities. WildGaussians outperforms both NeRF On-the-go and the original 3DGS, especially in medium to high occlusion scenarios. With a 400x faster rendering time compared to NeRF On-the-go, it excels in practical deployment scenarios demanding rapid response times.
Photo Tourism Dataset: Comprising images captured across different times and conditions, this dataset presents a significant challenge due to its high variance. WildGaussians demonstrated superior performance against contemporary baselines, including NeRF-W, Ha-NeRF, and K-Planes, by maintaining higher PSNR and SSIM values while ensuring real-time rendering capabilities.

Detailed Analysis and Ablation Studies

The authors conducted extensive ablation studies to validate the effectiveness of their modeling strategies:

Removing appearance modeling resulted in substantial performance drops in environments with high appearance variability, underscoring the need for localized color adjustments.
Disabling the uncertainty module notably degraded the model's robustness in high occlusion settings, highlighting its role in filtering out transient objects effectively.

Implications and Future Work

The implications of WildGaussians are multifold:

Practical Applications: This method is poised to benefit fields requiring real-time photorealistic scene reconstructions such as virtual reality (VR), robotics, and immersive media content creation.
Theoretical Contributions: It advances the understanding of how explicit scene representations can be made adaptable to dynamic and unconstrained environments.

Future developments in this area could explore integrating additional priors or pre-trained models like diffusion models to further enhance robustness. Moreover, extending the methodology to capture more complex illuminative phenomena such as highlights and reflections may yield even more realistic renderings.

Conclusion

WildGaussians represents a significant stride in adapting 3DGS for uncontrolled, real-world data, ensuring high-fidelity renderings amidst dynamic conditions and rapid scene changes. By implementing sophisticated appearance and uncertainty modeling, this method aligns closely with the diverse needs of modern 3D scene reconstruction applications, achieving a commendable balance between quality and efficiency.

PDF Markdown

Related Papers

Tweets

https://twitter.com/janusch_patas/status/1811621766512455756

https://twitter.com/ducha_aiki/status/1811700127309901996

https://twitter.com/fly51fly/status/1812606773125325081

https://twitter.com/_vztu/status/1811862131886948696

https://twitter.com/arxivsanitybot/status/1811946791241232609

https://twitter.com/javaeeeee1/status/1812581013538615725

YouTube

Show All Videos

HackerNews

WildGaussians: 3D Gaussian Splatting in the Wild (108 points, 19 comments)