Emergent Mind

Abstract

We consider the problem of novel view synthesis (NVS) for dynamic scenes. Recent neural approaches have accomplished exceptional NVS results for static 3D scenes, but extensions to 4D time-varying scenes remain non-trivial. Prior efforts often encode dynamics by learning a canonical space plus implicit or explicit deformation fields, which struggle in challenging scenarios like sudden movements or capturing high-fidelity renderings. In this paper, we introduce 4D Gaussian Splatting (4DGS), a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians, inspired by the success of 3D Gaussian Splatting in static scenes. We model dynamics at each timestamp by temporally slicing the 4D Gaussians, which naturally compose dynamic 3D Gaussians and can be seamlessly projected into images. As an explicit spatial-temporal representation, 4DGS demonstrates powerful capabilities for modeling complicated dynamics and fine details, especially for scenes with abrupt motions. We further implement our temporal slicing and splatting techniques in a highly optimized CUDA acceleration framework, achieving real-time inference rendering speeds of up to 277 FPS on an RTX 3090 GPU and 583 FPS on an RTX 4090 GPU. Rigorous evaluations on scenes with diverse motions showcase the superior efficiency and effectiveness of 4DGS, which consistently outperforms existing methods both quantitatively and qualitatively.

Overview

  • 4D Gaussian Splatting (4DGS) introduces a new approach to novel view synthesis for dynamic scenes, focusing on high-quality rendering and real-time performance.

  • The method builds upon 3D Gaussian Splatting by extending it into the temporal domain, creating an anisotropic 4D representation for natural scene dynamics.

  • A CUDA-optimized implementation achieves real-time rendering speeds and introduces novel loss functions for enhanced dynamic scene reconstruction.

  • Empirical results demonstrate superior performance over prior methods in rendering quality and speed, promising advancements in VR/AR, gaming, and film industries.

Introduction

4D Gaussian Splatting (4DGS) is a novel approach taking on the challenge of novel view synthesis (NVS) for dynamic scenes, a task intrinsically more complex than its static counterpart due to the extra temporal dimension and varying motion patterns. Traditional methods leveraging canonical plus deformation fields often struggle with high-fidelity rendering and abrupt motion depiction, while volumetric methods fall short in real-time rendering due to the intense computational demand of densely sampled rays.

Novel Approach

The paper presents a spatial-temporal representation based on the expansion of 3D Gaussian Splatting (3DGS) into the temporal domain. By encoding dynamic scenes with anisotropic 4D XYZT Gaussians, an explicit model capable of natural dynamic scene composition is created. These temporally-sliced 4D Gaussians seamlessly project dynamic 3D Gaussians onto images, harnessing both high quality rendering and impressive rendering speeds inherited from 3DGS. To address the inherent challenges in designing 4D rotation features and a spatial-temporal optimization scheme, geometric algebra is employed to introduce a 4D rotor-based representation offering intuitive and powerful rotation handling.

CUDA-Optimized Implementation

Temporal slicing and splatting techniques are optimized under a highly accelerated CUDA framework, boasting real-time inference rendering speeds of up to 277 FPS on an RTX 3090 GPU and even 583 FPS on an RTX 4090 GPU. The framework is equipped with entropy loss and a novel 4D consistency loss to stabilize and improve dynamic reconstruction, effectively handling “floaters,” enhancing details, and maintaining consistent dynamics. These novel regularizations demonstrate noticeable improvements across rigorous quantitative and qualitative evaluations.

Superior Results

Empirical results displayed in this paper are convincing. On the Plenoptic Video Dataset, 4DGS outperforms prior state-of-the-art methods, yielding the highest PSNR of 31.62, which is a strong numerical result. The approach efficiently renders high-resolution videos while offering both a notable speed advantage and superior scene reconstruction quality. In rendering monocular videos from the D-NeRF Dataset, 4DGS achieves a substantial leap in rendering quality, hitting 1258 FPS, which is significantly faster than previously reported methods.

Conclusion

Summarizing the contributions, 4DGS not only advances the NVS for dynamic scenes by providing a practical spatial-temporal representation but also sets a new benchmark in terms of speed and rendering fidelity. Its high performance and adaptability, being a unified framework suitable for both static and dynamic environments, holds great promise for a range of future industrial applications in VR/AR, gaming, and film production. In addition, the release of the code offers valuable assets for the community, potentially stimulating further research and development in the field of dynamic scene rendering.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.