- The paper introduces a novel spatial-temporal representation that extends 3D Gaussian Splatting into 4D for dynamic scene synthesis.
- It leverages a CUDA-optimized framework achieving real-time rendering speeds up to 583 FPS and a PSNR of 31.62 for superior quality.
- The method employs geometric algebra with 4D rotors and novel loss functions to effectively stabilize and enhance dynamic scene reconstruction.
Introduction
4D Gaussian Splatting (4DGS) is a novel approach taking on the challenge of novel view synthesis (NVS) for dynamic scenes, a task intrinsically more complex than its static counterpart due to the extra temporal dimension and varying motion patterns. Traditional methods leveraging canonical plus deformation fields often struggle with high-fidelity rendering and abrupt motion depiction, while volumetric methods fall short in real-time rendering due to the intense computational demand of densely sampled rays.
Novel Approach
The paper presents a spatial-temporal representation based on the expansion of 3D Gaussian Splatting (3DGS) into the temporal domain. By encoding dynamic scenes with anisotropic 4D XYZT Gaussians, an explicit model capable of natural dynamic scene composition is created. These temporally-sliced 4D Gaussians seamlessly project dynamic 3D Gaussians onto images, harnessing both high quality rendering and impressive rendering speeds inherited from 3DGS. To address the inherent challenges in designing 4D rotation features and a spatial-temporal optimization scheme, geometric algebra is employed to introduce a 4D rotor-based representation offering intuitive and powerful rotation handling.
CUDA-Optimized Implementation
Temporal slicing and splatting techniques are optimized under a highly accelerated CUDA framework, boasting real-time inference rendering speeds of up to 277 FPS on an RTX 3090 GPU and even 583 FPS on an RTX 4090 GPU. The framework is equipped with entropy loss and a novel 4D consistency loss to stabilize and improve dynamic reconstruction, effectively handling “floaters,” enhancing details, and maintaining consistent dynamics. These novel regularizations demonstrate noticeable improvements across rigorous quantitative and qualitative evaluations.
Superior Results
Empirical results displayed in this paper are convincing. On the Plenoptic Video Dataset, 4DGS outperforms prior state-of-the-art methods, yielding the highest PSNR of 31.62, which is a strong numerical result. The approach efficiently renders high-resolution videos while offering both a notable speed advantage and superior scene reconstruction quality. In rendering monocular videos from the D-NeRF Dataset, 4DGS achieves a substantial leap in rendering quality, hitting 1258 FPS, which is significantly faster than previously reported methods.
Conclusion
Summarizing the contributions, 4DGS not only advances the NVS for dynamic scenes by providing a practical spatial-temporal representation but also sets a new benchmark in terms of speed and rendering fidelity. Its high performance and adaptability, being a unified framework suitable for both static and dynamic environments, holds great promise for a range of future industrial applications in VR/AR, gaming, and film production. In addition, the release of the code offers valuable assets for the community, potentially stimulating further research and development in the field of dynamic scene rendering.