4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes (2402.03307v3)

Published 5 Feb 2024 in cs.CV

Abstract: We consider the problem of novel-view synthesis (NVS) for dynamic scenes. Recent neural approaches have accomplished exceptional NVS results for static 3D scenes, but extensions to 4D time-varying scenes remain non-trivial. Prior efforts often encode dynamics by learning a canonical space plus implicit or explicit deformation fields, which struggle in challenging scenarios like sudden movements or generating high-fidelity renderings. In this paper, we introduce 4D Gaussian Splatting (4DRotorGS), a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians, inspired by the success of 3D Gaussian Splatting in static scenes. We model dynamics at each timestamp by temporally slicing the 4D Gaussians, which naturally compose dynamic 3D Gaussians and can be seamlessly projected into images. As an explicit spatial-temporal representation, 4DRotorGS demonstrates powerful capabilities for modeling complicated dynamics and fine details--especially for scenes with abrupt motions. We further implement our temporal slicing and splatting techniques in a highly optimized CUDA acceleration framework, achieving real-time inference rendering speeds of up to 277 FPS on an RTX 3090 GPU and 583 FPS on an RTX 4090 GPU. Rigorous evaluations on scenes with diverse motions showcase the superior efficiency and effectiveness of 4DRotorGS, which consistently outperforms existing methods both quantitatively and qualitatively.

Authors (6)

Yuanxing Duan (1 paper)
Fangyin Wei (16 papers)
Qiyu Dai (6 papers)
Yuhang He (31 papers)
Wenzheng Chen (28 papers)
Baoquan Chen (85 papers)

Citations (40)

View on Semantic Scholar

Summary

The paper introduces a novel spatial-temporal representation that extends 3D Gaussian Splatting into 4D for dynamic scene synthesis.
It leverages a CUDA-optimized framework achieving real-time rendering speeds up to 583 FPS and a PSNR of 31.62 for superior quality.
The method employs geometric algebra with 4D rotors and novel loss functions to effectively stabilize and enhance dynamic scene reconstruction.

Introduction

4D Gaussian Splatting (4DGS) is a novel approach taking on the challenge of novel view synthesis (NVS) for dynamic scenes, a task intrinsically more complex than its static counterpart due to the extra temporal dimension and varying motion patterns. Traditional methods leveraging canonical plus deformation fields often struggle with high-fidelity rendering and abrupt motion depiction, while volumetric methods fall short in real-time rendering due to the intense computational demand of densely sampled rays.

Novel Approach

The paper presents a spatial-temporal representation based on the expansion of 3D Gaussian Splatting (3DGS) into the temporal domain. By encoding dynamic scenes with anisotropic 4D XYZT Gaussians, an explicit model capable of natural dynamic scene composition is created. These temporally-sliced 4D Gaussians seamlessly project dynamic 3D Gaussians onto images, harnessing both high quality rendering and impressive rendering speeds inherited from 3DGS. To address the inherent challenges in designing 4D rotation features and a spatial-temporal optimization scheme, geometric algebra is employed to introduce a 4D rotor-based representation offering intuitive and powerful rotation handling.

CUDA-Optimized Implementation

Temporal slicing and splatting techniques are optimized under a highly accelerated CUDA framework, boasting real-time inference rendering speeds of up to 277 FPS on an RTX 3090 GPU and even 583 FPS on an RTX 4090 GPU. The framework is equipped with entropy loss and a novel 4D consistency loss to stabilize and improve dynamic reconstruction, effectively handling “floaters,” enhancing details, and maintaining consistent dynamics. These novel regularizations demonstrate noticeable improvements across rigorous quantitative and qualitative evaluations.

Superior Results

Empirical results displayed in this paper are convincing. On the Plenoptic Video Dataset, 4DGS outperforms prior state-of-the-art methods, yielding the highest PSNR of 31.62, which is a strong numerical result. The approach efficiently renders high-resolution videos while offering both a notable speed advantage and superior scene reconstruction quality. In rendering monocular videos from the D-NeRF Dataset, 4DGS achieves a substantial leap in rendering quality, hitting 1258 FPS, which is significantly faster than previously reported methods.

Conclusion

Summarizing the contributions, 4DGS not only advances the NVS for dynamic scenes by providing a practical spatial-temporal representation but also sets a new benchmark in terms of speed and rendering fidelity. Its high performance and adaptability, being a unified framework suitable for both static and dynamic environments, holds great promise for a range of future industrial applications in VR/AR, gaming, and film production. In addition, the release of the code offers valuable assets for the community, potentially stimulating further research and development in the field of dynamic scene rendering.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1755148156797329762

https://twitter.com/janusch_patas/status/1754732825205407751

https://twitter.com/arxivsanitybot/status/1755221301302644895

https://twitter.com/gm8xx8/status/1754690029065023505

Reddit

[2402.03307] 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes (9 points, 4 comments)