Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InstantSplat: Sparse-view Gaussian Splatting in Seconds (2403.20309v5)

Published 29 Mar 2024 in cs.CV

Abstract: While neural 3D reconstruction has advanced substantially, its performance significantly degrades with sparse-view data, which limits its broader applicability, since SfM is often unreliable in sparse-view scenarios where feature matches are scarce. In this paper, we introduce InstantSplat, a novel approach for addressing sparse-view 3D scene reconstruction at lightning-fast speed. InstantSplat employs a self-supervised framework that optimizes 3D scene representation and camera poses by unprojecting 2D pixels into 3D space and aligning them using differentiable neural rendering. The optimization process is initialized with a large-scale trained geometric foundation model, which provides dense priors that yield initial points through model inference, after which we further optimize all scene parameters using photometric errors. To mitigate redundancy introduced by the prior model, we propose a co-visibility-based geometry initialization, and a Gaussian-based bundle adjustment is employed to rapidly adapt both the scene representation and camera parameters without relying on a complex adaptive density control process. Overall, InstantSplat is compatible with multiple point-based representations for view synthesis and surface reconstruction. It achieves an acceleration of over 30x in reconstruction and improves visual quality (SSIM) from 0.3755 to 0.7624 compared to traditional SfM with 3D-GS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Novel view synthesis in tensor space. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1034–1040. IEEE, 1997.
  2. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5460–5469, 2022a.
  3. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022b.
  4. Zip-nerf: Anti-aliased grid-based neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19697–19705, 2023.
  5. Nope-nerf: Optimising neural radiance field with no pose prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4160–4169, 2023.
  6. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  7. Neurbf: A neural fields representation with adaptive radial basis functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4182–4194, 2023.
  8. Lu-nerf: Scene and pose estimation by synchronizing local unposed nerfs. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18312–18321, 2023.
  9. Depth-supervised nerf: Fewer views and faster training for free. arXiv preprint arXiv:2107.02791, 2021.
  10. Colmap-free 3d gaussian splatting. arXiv preprint arXiv:2312.07504, 2023.
  11. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5885–5894, 2021.
  12. Self-calibrating neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5846–5854, 2021.
  13. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
  14. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
  15. Corresnerf: Image correspondence priors for neural radiance fields. Advances in Neural Information Processing Systems, 36, 2024.
  16. Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5741–5751, 2021.
  17. Progressively optimized local radiance fields for robust view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16539–16548, 2023.
  18. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4):1–14, 2019.
  19. Nerf: Representing Scenes As Neural Radiance Fields for View Synthesis. Communications of the ACM, 65(1):99–106, 2021.
  20. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. arXiv preprint arXiv:2112.00724, 2021.
  21. Soft 3D Reconstruction for View Synthesis. ACM Transactions on Graphics (TOG), 36(6):1–11, 2017.
  22. F Plastria. The weiszfeld algorithm: proof, amendments and extensions, ha eiselt and v. marianov (eds.) foundations of location analysis, international series in operations research and management science, 2011.
  23. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  24. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016.
  25. Photorealistic Scene Reconstruction by Voxel Coloring, 2002. US Patent 6,363,170.
  26. Pushing the Boundaries of View Extrapolation With Multiplane Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 175–184, 2019.
  27. Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8080–8089, 2020.
  28. Sparf: Neural radiance fields from sparse and noisy poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4190–4200, 2023.
  29. Shinji Umeyama. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence, 13(04):376–380, 1991.
  30. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. arXiv preprint arXiv:2303.16196, 2023a.
  31. Dust3r: Geometric 3d vision made easy. arXiv preprint arXiv:2312.14132, 2023b.
  32. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  33. Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021.
  34. Reconfusion: 3d reconstruction with diffusion priors. arXiv preprint arXiv:2312.02981, 2023.
  35. Sparsegs: Real-time 360 {{\{{\\\backslash\deg}}\}} sparse view synthesis using gaussian splatting. arXiv preprint arXiv:2312.00206, 2023.
  36. Sinnerf: Training neural radiance fields on complex scenes from a single image. In European Conference on Computer Vision, pages 736–753. Springer, 2022a.
  37. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022b.
  38. Freenerf: Improving few-shot neural rendering with free frequency regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8254–8263, 2023.
  39. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
  40. Mvimgnet: A large-scale dataset of multi-view images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9150–9161, 2023.
  41. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  42. Fsgs: Real-time few-shot view synthesis using gaussian splatting. arXiv preprint arXiv:2312.00451, 2023.
  43. Ewa volume splatting. In Proceedings Visualization, 2001. VIS’01., pages 29–538. IEEE, 2001.
Citations (9)

Summary

  • The paper introduces InstantSplat, a unified framework that integrates 3D Gaussian splatting with dense stereo priors to address sparse-view and pose-free novel view synthesis challenges.
  • It employs a two-stage process with Coarse Geometric Initialization for fast scene and camera estimation followed by Fast 3D-Gaussian Optimization to refine parameters efficiently.
  • Experimental results on Tanks & Temples demonstrate a 32% increase in SSIM and an 80% reduction in ATE, highlighting significant improvements in rendering quality and pose accuracy.

InstantSplat: Efficient and Unified Framework for Sparse-View 3D Reconstruction and Novel View Synthesis

Introduction

The newfound framework, named InstantSplat, introduces a novel methodology for addressing the challenges in novel view synthesis (NVS) under sparse-view and pose-free conditions. Through the integration of 3D Gaussian Splatting and Dense Stereo Priors, InstantSplat establishes itself as a potent solution for reconstructing 3D scenes and synthesizing novel views with high fidelity. The framework distinguishes itself by significantly improving both pose estimation accuracy and rendering quality, backed by strong numerical results on the Tanks and Temples datasets. The processes culminate in a robust system that can operate within one minute for large-scale scenes, marking a notable advancement in the field of 3D computer vision.

Key Contributions

  • InstantSplat innovates by leveraging 3D Gaussian Splatting with dense stereo priors derived from an end-to-end dense stereo model (DUSt3R), effectively tackling sparse-view and pose-free challenges in NVS.
  • The framework encompasses two main components: a Coarse Geometric Initialization (CGI) module for rapid preliminary scene structure and camera parameter estimation, and a Fast 3D-Gaussian Optimization (F-3DGO) module for joint optimization of 3D Gaussian attributes and initialized poses.
  • Demonstrated improvements include a 32% increase in SSIM and an 80% reduction in Absolute Trajectory Error (ATE) on the Tanks & Temples datasets compared to existing methods, evidencing its capability to maintain high rendering quality and accurate pose estimation in sparse and unconditioned scenarios.

Methodology Overview

Coarse Geometric Initialization (CGI)

The CGI module harnesses the dense stereo model, DUSt3R, to predict globally aligned 3D point maps from sparse-view images. This alignment furnishes an initial geometric and photographic context that facilitates the rapid estimation of preliminary scene structures and camera parameters.

Fast 3D-Gaussian Optimization (F-3DGO)

Following CGI, the F-3DGO module employs these initial estimates to refine the 3D Gaussian attributes and camera poses further. It implements pose regularization, substantially enhancing the final pose accuracy and rendering quality through an efficient optimization process.

Experimental Insights

Extensive evaluations on the outdoor Tanks & Temples datasets underscore InstantSplat's superiority in sparse-view and pose-free scenarios. The method not only significantly outperforms existing pose-free methods in rendering quality but also showcases remarkable improvements in pose estimation accuracy.

Theoretical and Practical Implications

On a theoretical level, InstantSplat presents a novel approach to NVS tasks by combining explicit 3D representation with pose priors, diverging from the dependence on dense data coverage or prior knowledge of camera parameters. Practically, the method's efficiency and effectiveness in handling real-world scenarios indicate its potential applicability in areas such as digital twin construction, augmented reality, and beyond.

Future Directions

The current landscape of NVS under sparse-view conditions suggests a promising direction for future research to explore the integration of machine learning techniques with explicit 3D representations further. Developments in end-to-end systems capable of reconstructing and rendering scenes from extremely sparse and unconditioned inputs could revolutionize 3D content creation and visualization technologies.

Summary

InstantSplat represents a significant leap towards solving the long-standing challenges in novel view synthesis, specifically in sparse-view and pose-free settings. By proficiently merging the capabilities of dense stereo models with 3D Gaussian Splatting, it offers a fast, accurate, and practicable solution for 3D scene reconstruction and rendering, paving the way for next-generation 3D vision applications.

Youtube Logo Streamline Icon: https://streamlinehq.com