Papers
Topics
Authors
Recent
2000 character limit reached

CityX: Controllable Procedural Content Generation for Unbounded 3D Cities (2407.17572v4)

Published 24 Jul 2024 in cs.CV and cs.AI

Abstract: Urban areas, as the primary human habitat in modern civilization, accommodate a broad spectrum of social activities. With the surge of embodied intelligence, recent years have witnessed an increasing presence of physical agents in urban areas, such as autonomous vehicles and delivery robots. As a result, practitioners significantly value crafting authentic, simulation-ready 3D cities to facilitate the training and verification of such agents. However, this task is quite challenging. Current generative methods fall short in either diversity, controllability, or fidelity. In this work, we resort to the procedural content generation (PCG) technique for high-fidelity generation. It assembles superior assets according to empirical rules, ultimately leading to industrial-grade outcomes. To ensure diverse and self contained creation, we design a management protocol to accommodate extensive PCG plugins with distinct functions and interfaces. Based on this unified PCG library, we develop a multi-agent framework to transform multi-modal instructions, including OSM, semantic maps, and satellite images, into executable programs. The programs coordinate relevant plugins to construct the 3D city consistent with the control condition. A visual feedback scheme is introduced to further refine the initial outcomes. Our method, named CityX, demonstrates its superiority in creating diverse, controllable, and realistic 3D urban scenes. The synthetic scenes can be seamlessly deployed as a real-time simulator and an infinite data generator for embodied intelligence research. Our project page: https://cityx-lab.github.io.

Citations (3)

Summary

  • The paper presents a novel system X that integrates a dynamic PCG management protocol with a multi-agent framework to generate realistic urban scenes.
  • It leverages multimodal inputs such as OSM, semantic maps, and satellite images to optimize asset selection and ensure scene consistency.
  • Experiments demonstrate that X outperforms previous methods in geometric regularity and aesthetic evaluation for unbounded 3D city generation.

Controllable Procedural Content Generation for Unbounded 3D Cities

Introduction

Procedural Content Generation (PCG) for large-scale 3D urban environments presents a complex challenge due to the diverse 3D assets involved and the necessity for strict, varied layout constraints. Existing methodologies have attempted to address these issues using generative models or NeRF-based approaches but often fall short in scaling and fine-grained control. "CityX: Controllable Procedural Content Generation for Unbounded 3D Cities" (2407.17572) introduces a novel multi-modal controlled PCG method, named X, designed to enhance the generation of realistic 3D cities guided by multimodal inputs such as OpenStreetMap (OSM), semantic maps, and satellite images.

Methodology

The proposed system, X, integrates a PCG management protocol with a multi-agent framework to generate and manage urban scenes effectively.

PCG Management Protocol

X introduces a dynamic protocol to integrate various PCG plugins, managing the complex interaction between these plugins and Blender’s action functions. This protocol comprises:

  • Dynamic API Conversion Interface: Facilitates the integration and communication among diverse PCG APIs, enabling seamless adaptation of different plugin formats.
  • Structured Encapsulation: Lowers the technical barriers for beginners by employing a consistent structure for action functions, making it easier to utilize complex PCG functionalities within the Blender environment.
  • Infinite Asset Libraries: Utilizes a continually expanding library of assets coupled with an innovative asset-retrieval system through pre-trained CLIP models, optimizing the asset selection process to fit city layout demands.

Multi-Agent Framework

This framework orchestrates the procedural generation process through multiple specialized agents:

  • Annotator: Labels action functions for easy access by other agents.
  • Planner: Devises a flexible, open-loop workflow that dynamically adjusts to user inputs and system feedback, ensuring coherent task execution.
  • Executor: Operates within Blender to execute tasks using structured encapsulation, thereby enhancing system interactivity and accuracy.
  • Evaluator: Employs visual feedback to assess task completion, guiding the system towards more precise urban scene generation. Figure 1

    Figure 1: The proposed X, under the guidance of multimodal inputs including OSM data, semantic maps, and satellite images, facilitates the automatic creation of realistic large-scale 3D urban scenes.

Experiments and Results

Experiments demonstrate the efficacy of X in producing realistic urban environments from various multimodal inputs. The framework not only supports diverse inputs but also improves scene realism and consistency when compared to existing methods.

Comparative Study

X outperforms prior city generation methodologies like CityDreamer and SceneDreamer under different input conditions by addressing issues such as asset overlap and repetitive structures. The system's ability to maintain geometric regularity while ensuring high-quality output sets it apart. Figure 2

Figure 2: Comparative results on city generation. Issues with unreasonable geometry are observed in previous works, while our method performs well in generating realistic large-scale city scenes.

Aesthetic Evaluation

In collaborative aesthetic evaluations involving both experts and volunteers, X achieved higher scores across both aesthetic and rationality dimensions, marking a significant improvement in urban scene generation quality. Figure 3

Figure 3: Urban scene generation with multimodal inputs, where we present an overhead view aligned with the multimodal input perspective, along with three street-level views.

Conclusions

X represents a significant advancement in procedural content generation for urban environments, effectively bridging the gap between generated assets and industrial requirements. Its capacity for handling multimodal inputs and delivering scalable, high-resolution 3D city scenes highlights its potential contributions to the PCG community and its applications in gaming, virtual reality, and animation industries.

The multi-agent framework is particularly noteworthy for its innovative handling of complex interactions within urban generation tasks, providing a versatile and scalable solution that could inform future developments in procedural generation systems. While promising, future research may focus on enhancing parameter extraction efficiency and broadening the diversity of generation techniques beyond existing procedural constraints.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 30 likes about this paper.