Emergent Mind

Abstract

3D city generation with NeRF-based methods shows promising generation results but is computationally inefficient. Recently 3D Gaussian Splatting (3D-GS) has emerged as a highly efficient alternative for object-level 3D generation. However, adapting 3D-GS from finite-scale 3D objects and humans to infinite-scale 3D cities is non-trivial. Unbounded 3D city generation entails significant storage overhead (out-of-memory issues), arising from the need to expand points to billions, often demanding hundreds of Gigabytes of VRAM for a city scene spanning 10km2. In this paper, we propose GaussianCity, a generative Gaussian Splatting framework dedicated to efficiently synthesizing unbounded 3D cities with a single feed-forward pass. Our key insights are two-fold: 1) Compact 3D Scene Representation: We introduce BEV-Point as a highly compact intermediate representation, ensuring that the growth in VRAM usage for unbounded scenes remains constant, thus enabling unbounded city generation. 2) Spatial-aware Gaussian Attribute Decoder: We present spatial-aware BEV-Point decoder to produce 3D Gaussian attributes, which leverages Point Serializer to integrate the structural and contextual characteristics of BEV points. Extensive experiments demonstrate that GaussianCity achieves state-of-the-art results in both drone-view and street-view 3D city generation. Notably, compared to CityDreamer, GaussianCity exhibits superior performance with a speedup of 60 times (10.72 FPS v.s. 0.18 FPS).

Overview

  • GaussianCity introduces a novel framework for generating extensive 3D cityscapes using 3D Gaussian splatting, overcoming computational inefficiencies of traditional NeRF-based methods.

  • The framework utilizes a compact 3D scene representation called BEV-Point to manage VRAM usage effectively, along with a Spatial-aware Gaussian Attribute Decoder for enhanced scene quality.

  • Experiments demonstrate that GaussianCity significantly outperforms existing methods in visual quality and computational efficiency, verified through quantitative metrics and qualitative comparisons on datasets like GoogleEarth and KITTI-360.

GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

Introduction

The paper "GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation" introduces a novel framework designed to address the inefficiencies and scalability issues in the generation of expansive 3D cityscapes. Traditional NeRF-based methods capture intricate details but are computationally demanding, limiting their applicability to large-scale scenes. The proposed method leverages the efficiency of 3D Gaussian splatting (3D-GS) to overcome these limitations, rendering extensive urban environments with significantly reduced computational overhead.

Key Contributions

The contributions of this paper are multifaceted:

  1. Compact 3D Scene Representation: The introduction of BEV-Point as a highly compact intermediate representation mitigates VRAM usage growth, ensuring constant memory requirements regardless of the scene's expanse.
  2. Spatial-aware Gaussian Attribute Decoder: This novel decoder integrates structural and contextual characteristics of BEV points, enhancing the representation quality and consistency of the generated scenes.

Methodology

The GaussianCity framework hinges on two pivotal components:

  1. BEV-Point Initialization: This compact scene representation keeps VRAM usage constant by considering only visible BEV points during rendering and optimization. This is achieved through ray intersection to filter out visible points from the BEV maps, which include the height field, semantic map, and binary density map.
  2. BEV-Point Decoder: This decoder employs a point serializer and a point transformer to generate 3D Gaussian attributes. The point serializer restructures unstructured BEV points into a sequence, while the point transformer processes these serialized features to maintain spatial correlations.

Experimental Results

The efficacy of GaussianCity is validated through extensive experiments on the GoogleEarth and KITTI-360 datasets, showcasing superior performance in terms of visual quality and computational efficiency.

  • Quantitative Metrics: On the GoogleEarth dataset, GaussianCity achieves remarkably lower FID and KID scores (86.94 and 0.090, respectively) compared to the state-of-the-art CityDreamer, which scored 97.38 and 0.096. Moreover, GaussianCity significantly outperforms CityDreamer in runtime efficiency, achieving a speedup of 60 times (10.72 FPS vs. 0.18 FPS). Similar trends are observed on the KITTI-360 dataset, with GaussianCity achieving FID and KID scores of 29.5 and 0.017, respectively.
  • Qualitative Comparisons: Visual inspections reveal that GaussianCity excels in preserving structural details and handling complex textures, outperforming methods like PersistentNature, SceneDreamer, and InfiniCity. The reduction of artifacts and more consistent multi-view generation underscore the method's robustness.

Implications and Future Developments

The implications of GaussianCity are significant across multiple domains, including gaming, virtual reality, and urban planning. By reducing memory overhead and enhancing rendering speed, this method makes real-time, large-scale 3D city generation feasible. The introduction of compact representations and efficient decoders lays the groundwork for future research focusing on further optimization and broader applicability.

Future research may explore generating additional Gaussian attributes, such as xyz offsets, opacity, and scale, to fully harness the representational capacity of 3D Gaussian splatting. Additionally, improving the BEV-Point Initialization process to handle more complex structures, beyond the Manhattan assumption, could further enhance the generated scene's realism.

Conclusion

GaussianCity represents a significant advancement in the field of 3D city generation. By leveraging a compact representation and an efficient decoder, it addresses key limitations of traditional methods, enabling the generation of unbounded 3D cityscapes with high realism and efficiency. This work establishes a solid foundation for continued innovation in scalable and efficient 3D scene generation techniques.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.