Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis (2312.16812v2)

Published 28 Dec 2023 in cs.CV and cs.GR

Abstract: Novel view synthesis of dynamic scenes has been an intriguing yet challenging problem. Despite recent advancements, simultaneously achieving high-resolution photorealistic results, real-time rendering, and compact storage remains a formidable task. To address these challenges, we propose Spacetime Gaussian Feature Splatting as a novel dynamic scene representation, composed of three pivotal components. First, we formulate expressive Spacetime Gaussians by enhancing 3D Gaussians with temporal opacity and parametric motion/rotation. This enables Spacetime Gaussians to capture static, dynamic, as well as transient content within a scene. Second, we introduce splatted feature rendering, which replaces spherical harmonics with neural features. These features facilitate the modeling of view- and time-dependent appearance while maintaining small size. Third, we leverage the guidance of training error and coarse depth to sample new Gaussians in areas that are challenging to converge with existing pipelines. Experiments on several established real-world datasets demonstrate that our method achieves state-of-the-art rendering quality and speed, while retaining compact storage. At 8K resolution, our lite-version model can render at 60 FPS on an Nvidia RTX 4090 GPU. Our code is available at https://github.com/oppo-us-research/SpacetimeGaussians.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (97)
  1. Neural point-based graphics. arXiv preprint arXiv:1906.08240, 2019.
  2. Matryodshka: Real-time 6dof video view synthesis using multi-sphere images. In European Conference on Computer Vision, pages 441–459. Springer, 2020.
  3. Learning neural light fields with ray-space embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19819–19829, 2022.
  4. HyperReel: High-fidelity 6-DoF video with ray-conditioned sampling. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  5. 4d visualization of dynamic events from unconstrained multi-view videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5366–5375, 2020.
  6. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  7. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  8. Zip-nerf: Anti-aliased grid-based neural radiance fields. ICCV, 2023.
  9. X-fields: Implicit neural view-, light-and time-image interpolation. ACM Transactions on Graphics (TOG), 39(6):1–15, 2020.
  10. Immersive light field video with a layered mesh representation. 39(4):86:1–86:15, 2020.
  11. Unstructured lumigraph rendering. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, page 425–432, New York, NY, USA, 2001. Association for Computing Machinery.
  12. Hexplane: A fast representation for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 130–141, 2023.
  13. A bayesian approach for selective image-based rendering using superpixels. In 2015 International Conference on 3D Vision, pages 469–477. IEEE, 2015.
  14. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics (TOG), 32(3):1–12, 2013.
  15. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022.
  16. A neural rendering framework for free-viewpoint relighting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5599–5610, 2020.
  17. Neurbf: A neural fields representation with adaptive radial basis functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4182–4194, 2023.
  18. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (ToG), 34(4):1–13, 2015.
  19. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, page 11–20, New York, NY, USA, 1996. Association for Computing Machinery.
  20. Neural radiance flow for 4d view synthesis and video processing. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 14304–14314. IEEE Computer Society, 2021.
  21. Fast dynamic radiance fields with time-aware neural voxels. In SIGGRAPH Asia 2022 Conference Papers, pages 1–9, 2022.
  22. Tpnet: Trajectory proposal network for motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6797–6806, 2020.
  23. Signet: Efficient neural representation for light fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14224–14233, 2021.
  24. Deepview: View synthesis with learned gradient descent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2367–2376, 2019.
  25. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  26. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12479–12488, 2023.
  27. Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5712–5721, 2021.
  28. Monocular dynamic view synthesis: A reality check. Advances in Neural Information Processing Systems, 35:33768–33780, 2022.
  29. The lumigraph. In Siggraph, pages 43–54, 1996.
  30. Scalable inside-out image-based rendering. ACM Trans. Graph., 35(6), 2016.
  31. Deep blending for free-viewpoint image-based rendering. In SIGGRAPH Asia 2018 Technical Papers, page 257. ACM, 2018.
  32. Plenoptic modeling and rendering from image sequences taken by hand-held camera. In Mustererkennung 1999, 21. DAGM-Symposium, page 94–101, Berlin, Heidelberg, 1999. Springer-Verlag.
  33. Vehicle trajectory prediction based on motion model and maneuver recognition. In 2013 IEEE/RSJ international conference on intelligent robots and systems, pages 4363–4369. IEEE, 2013.
  34. Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19774–19783, 2023.
  35. Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes, 2023. Accessed on 12,12,2023.
  36. Humanrf: High-fidelity neural radiance fields for humans in motion. arXiv preprint arXiv:2305.06356, 2023.
  37. Multiple perspective interactive video. In Proceedings of the international conference on multimedia computing and systems, pages 202–211. IEEE, 1995.
  38. Virtualized reality: Constructing virtual worlds from real scenes. IEEE multimedia, 4(1):34–47, 1997.
  39. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG), 42(4):1–14, 2023.
  40. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  41. First-person hyper-lapse videos. ACM Transactions on Graphics (TOG), 33(4):78, 2014.
  42. Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting. arXiV, 2023.
  43. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 31–42. ACM, 1996.
  44. Streaming radiance fields for 3d video synthesis. In Advances in Neural Information Processing Systems, 2022a.
  45. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5521–5531, 2022b.
  46. Robust 3d human motion reconstruction via dynamic template construction. In 2017 International Conference on 3D Vision (3DV), pages 496–505. IEEE, 2017.
  47. 4d human body correspondences from panoramic depth maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2877–2886, 2018.
  48. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6498–6508, 2021.
  49. Neulf: Efficient novel view synthesis with neural 4d light field. In Eurographics Symposium on Rendering, 2022c.
  50. Relit-neulf: Efficient relighting and novel view synthesis via neural 4d light field. In Proceedings of the 31st ACM International Conference on Multimedia, pages 7007–7016, 2023a.
  51. Dynibar: Neural dynamic image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4273–4284, 2023b.
  52. Im4d: High-fidelity and real-time novel view synthesis for dynamic scenes. arXiv preprint arXiv:2310.08585, 2023a.
  53. Deep 3d mask volume for view synthesis of dynamic scenes. In ICCV, 2021.
  54. Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle, 2023b.
  55. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651–15663, 2020.
  56. Robust dynamic radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13–23, 2023.
  57. Neural volumes: Learning dynamic renderable volumes from images. ACM Trans. Graph., 38(4):65:1–65:14, 2019.
  58. Mixture of volumetric primitives for efficient neural rendering. ACM Transactions on Graphics (ToG), 40(4):1–13, 2021.
  59. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024.
  60. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 2019.
  61. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  62. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  63. Donerf: Towards real-time rendering of compact neural radiance fields using depth oracle networks. In Computer Graphics Forum, pages 45–59. Wiley Online Library, 2021.
  64. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021a.
  65. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228, 2021b.
  66. Representing volumetric videos as dynamic mlp maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4252–4262, 2023.
  67. Soft 3d reconstruction for view synthesis. 36(6), 2017.
  68. Terminerf: Ray termination prediction for efficient neural rendering. In 2021 International Conference on 3D Vision (3DV), pages 1106–1114. IEEE, 2021.
  69. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  70. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14335–14345, 2021.
  71. Dataset and pipeline for multi-view light-field video. In Proceedings of the IEEE conference on computer vision and pattern recognition Workshops, pages 30–40, 2017.
  72. Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16632–16642, 2023.
  73. Deepvoxels: Learning persistent 3d feature embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2437–2446, 2019.
  74. Light field networks: Neural scene representations with single-evaluation rendering. Advances in Neural Information Processing Systems, 34:19313–19325, 2021.
  75. Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields. IEEE Transactions on Visualization and Computer Graphics, 29(5):2732–2742, 2023.
  76. Light field neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8269–8279, 2022.
  77. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
  78. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG), 38(4):1–12, 2019.
  79. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12959–12970, 2021.
  80. Neural trajectory fields for dynamic novel view synthesis. arXiv preprint arXiv:2105.05994, 2021.
  81. Mixed neural voxels for fast multi-view video synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19706–19716, 2023a.
  82. R2l: Distilling neural radiance field to neural light field for efficient novel view synthesis. In European Conference on Computer Vision, pages 612–629. Springer, 2022a.
  83. Fourier plenoctrees for dynamic radiance field rendering in real-time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13524–13534, 2022b.
  84. Neural residual radiance fields for streamably free-viewpoint videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 76–87, 2023b.
  85. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  86. Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9421–9431, 2021.
  87. Physgaussian: Physics-integrated 3d gaussians for generative dynamics. arXiv preprint arXiv:2311.12198, 2023.
  88. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
  89. 4k4d: Real-time 4d view synthesis at 4k resolution. 2023.
  90. A real-time distributed light field camera. Rendering Techniques, 2002(77-86):2, 2002.
  91. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023a.
  92. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv:2310.10642, 2023b.
  93. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
  94. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  95. Parallax photography: creating 3d cinematic effects from stills. In Proceedings of Graphics Interface 2009, pages 111–118. Canadian Information Processing Society, 2009.
  96. High-quality video view interpolation using a layered representation. ACM transactions on graphics (TOG), 23(3):600–608, 2004.
  97. Ewa volume splatting. In Proceedings Visualization, 2001. VIS’01., pages 29–538. IEEE, 2001.
Citations (75)

Summary

  • The paper introduces a novel dynamic scene representation using Spacetime Gaussians that capture static and transient features with temporal opacity and parametric motion.
  • It employs splatted neural features instead of traditional spherical harmonics, ensuring compact storage while maintaining expressive view- and time-dependent details.
  • Guided sampling based on training errors and coarse depth information enhances rendering quality, achieving 8K resolution at 60 fps on high-end GPUs.

Introduction to Spacetime Gaussian Feature Splatting

Rendering photorealistic views of dynamic scenes in real-time has been a significant challenge in the field of computer vision and graphics. Achieving a combination of high-resolution, real-time rendering, and compact storage remains particularly demanding. Current technologies enabling users to explore dynamic scenes with novel viewpoints are of great interest due to their applications in virtual and augmented reality, broadcasting, and education.

Innovations in Dynamic View Synthesis

A recent development addresses the intricate balance between rendering quality, speed, and storage efficiency. A new dynamic scene representation, termed Spacetime Gaussian Feature Splatting, has been proposed, incorporating three innovative components:

  1. Spacetime Gaussians: A novel approach which extends the concept of 3D Gaussians by incorporating temporal opacity and parametric motion/rotation into the traditional model. This allows for capturing static and dynamic features as well as transient content, which can consist of objects emerging or vanishing over time.
  2. Splatted Feature Rendering: This new technique forgoes spherical harmonics and instead utilizes neural features, which are smaller in size but offer robust expressiveness. These features handle view- and time-dependent appearances, contributing to the model's compactness.
  3. Guided Sampling: The optimization process is improved by sampling new Gaussians in areas that are difficult to render well, particularly those that are sparsely covered or located at a distance. This is guided by training error and coarse depth information, enhancing the rendering quality in complex scenes.

State-of-the-Art Performance

Experiments performed using this new representation have shown that it achieves remarkable results in terms of rendering quality and speed, even while maintaining a small model size. At a high 8K resolution, the model could render videos at 60 frames per second when tested on powerful hardware (Nvidia RTX 4090 GPU).

Contributions and Applications

This research presents several notable contributions:

  • A Spacetime Gaussian model which efficiently renders dynamic views with high fidelity.
  • A new rendering technique based on neural features rather than traditional spherical harmonics, enhancing the model's compactness.
  • An innovative sampling method that refines rendering quality by focusing on challenging areas.
  • Extensive testing on various real-world datasets demonstrates that the method surpasses current art in rendering quality and speed while ensuring a compact model size.

Conclusion and Future Work

The introduction of Spacetime Gaussian Feature Splatting marks a significant advance in dynamic view synthesis. By addressing the key challenges of rendering quality, speed, and model compactness, this technology is poised to enhance user experiences across multiple applications. However, the representation is not without limitations; it currently requires multi-view video inputs and cannot be trained on-the-fly. Future explorations may include adapting the model for monocular settings and improving its training efficiency to support streaming applications.

Github Logo Streamline Icon: https://streamlinehq.com