MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images (2403.14627v2)

Published 21 Mar 2024 in cs.CV

Abstract: We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we build a cost volume representation via plane sweeping, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We also learn other Gaussian primitives' parameters jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussians via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, MVSplat achieves state-of-the-art performance with the fastest feed-forward inference speed (22~fps). More impressively, compared to the latest state-of-the-art method pixelSplat, MVSplat uses $10\times$ fewer parameters and infers more than $2\times$ faster while providing higher appearance and geometry quality as well as better cross-dataset generalization.

Citations (80)

View on Semantic Scholar

Summary

The paper's main contribution is an innovative 3D Gaussian splatting method that leverages cost volume construction to enhance depth estimation.
It efficiently regresses Gaussian parameters from sparse images, achieving superior PSNR, SSIM, and LPIPS metrics with 10× fewer parameters and 2× faster inference.
The approach generalizes robustly across datasets, enabling real-time 3D scene reconstruction and novel view synthesis for digital twins and robotics applications.

MVSplat: Advancing 3D Reconstruction with Efficient Gaussian Splatting

Introduction to MVSplat

The paper introduces MVSplat, a novel methodology for efficient 3D Gaussian Splatting designed for reconstructing and synthesizing scenes from sparse multi-view images. This technique stands out by leveraging cost volume representation and regression of 3D Gaussian primitives, focusing on potent applications such as 3D scene reconstruction and novel view synthesis. MVSplat represents a significant step forward, offering a blend of high performance, rapid inference, and exceptional model efficiency.

Key Contributions

MVSplat brings several innovations and contributions to the field of 3D reconstruction:

Cost Volume Construction: It employs cost volume representation to enhance the localization of 3D Gaussian centers, utilizing cross-view feature similarities as geometry cues for accurate depth estimation.
Efficient 3D Gaussian Primitives Regression: The model innovatively regresses 3D Gaussian primitives' parameters (opacity, covariance, and color) directly from sparse images, supported by photometric supervision without the need for explicit 3D geometry supervision.
State-of-the-art Performance: On benchmark datasets RealEstate10K and ACID, MVSplat achieves top-tier performance coupled with the fastest inference speed among feed-forward models, demonstrating enhanced appearance quality, geometry fidelity, and impressive cross-dataset generalization capabilities.

Theoretical Underpinnings

MVSplat's core mechanism is based on the construction of a cost volume via plane sweeping, a method that effectively encodes multi-view depth estimation by capturing cross-view feature similarities. This approach simplifies the task from complex 3D regression to feature-matching, substantially reducing learning challenges and improving model robustness and performance.

Experimental Results

Extensive experiments validate MVSplat's superiority, particularly highlighting:

High-Quality Outputs: It produces higher quality renders with improved fidelity in appearance and geometry than leading models like pixelSplat, with significant gains in PSNR, SSIM, and LPIPS metrics.
Model Efficiency and Speed: MVSplat shows remarkable improvements in efficiency, using $10\times$ fewer parameters and offering more than $2\times$ faster inference, facilitating real-world applicability.
Generalization Capability: Showcased robust performance across different datasets without retraining, underscoring its potent generalization even in diverse and unseen environments.

Practical and Theoretical Implications

The advancements introduced by MVSplat hold substantial implications for both theoretical and practical applications:

Enhanced Scene Reconstruction: By efficiently synthesizing high-fidelity 3D structures from sparse viewpoints, MVSplat pushes the boundaries of what's possible in digital scene reconstruction, enabling more accurate and detailed digital twins and virtual environments.
AI and Robotics Applications: The efficiency and accuracy of MVSplat pave the way for real-time 3D mapping and navigation tasks in robotics and augmented reality systems, broadening the horizons for autonomous systems' interaction with their surroundings.
Future Directions in Research: The success of MVSplat in leveraging cost volume for 3D Gaussian Splatting models opens new research avenues, particularly in exploring further optimizations and applications of this methodology in other domains of computer vision and AI.

Conclusion

MVSplat represents a notable advance in 3D scene reconstruction and novel view synthesis. Through its innovative use of cost volume representation and efficient 3D Gaussian primitives regression, it sets new standards for model efficiency, reconstruction quality, and generalization. These qualities not only make it an excellent tool for current applications but also lay a foundation for future explorations in the domain of 3D computer vision.

Related Papers

Tweets

https://twitter.com/gm8xx8/status/1771013223111647633

YouTube

Show All Videos