Emergent Mind

SceneWiz3D: Towards Text-guided 3D Scene Composition

(2312.08885)
Published Dec 13, 2023 in cs.CV

Abstract

We are witnessing significant breakthroughs in the technology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challenging as a scene contains multiple 3D objects, diverse and scattered. In this work, we introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text. We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation: explicit for objects and implicit for scenes. Remarkably, an object, being represented explicitly, can be either generated from text using conventional text-to-3D approaches, or provided by users. To configure the layout of the scene and automatically place objects, we apply the Particle Swarm Optimization technique during the optimization process. Furthermore, it is difficult for certain parts of the scene (e.g., corners, occlusion) to receive multi-view supervision, leading to inferior geometry. We incorporate an RGBD panorama diffusion model to mitigate it, resulting in high-quality geometry. Extensive evaluation supports that our approach achieves superior quality over previous approaches, enabling the generation of detailed and view-consistent 3D scenes.

Overview

  • SceneWiz3D presents a hybrid strategy for converting text descriptions into high-fidelity 3D scenes by combining explicit and implicit models.

  • It uses Particle Swarm Optimization for arranging objects within a scene and can incorporate user-specified objects.

  • The method enhances detailed geometry generation, even in regions with limited multi-view accessibility, by using a diffusion model trained on panoramic RGBD images.

  • SceneWiz3D offers depth supervision and a superior global understanding of scene structures to improve scene fidelity.

  • Extensive validation shows SceneWiz3D outperforming benchmarks in generating view-consistent 3D scenes with high-quality object details, demonstrating its potential for various industries.

Overview of SceneWiz3D

Synthesis of high-fidelity 3D scenes from textual descriptions presents a complex challenge, as scenes involve multiple objects both explicit and distributed in nature. The SceneWiz3D method addresses this challenge with a unique hybrid 3D representation strategy, combining explicit models for individual objects and implicit models for overall scene environments. This dual approach allows for detailed object generation as well as flexible scene depiction.

Generating Scenes from Text Descriptions

SceneWiz3D enables the generation of intricate 3D scenes from textual prompts. One can either input objects using conventional text-to-3D object generation methods or import user-specified objects. These objects are then arranged within a scene using Particle Swarm Optimization to find the most suitable layout, which balances exploration and exploitation to avoid local optima. A noteworthy aspect of SceneWiz3D is its capability to represent objects explicitly while adopting an implicit representation for other scene components, easing the portrayal of scenes with varying depth ranges.

Addressing Challenges in 3D Scene Generation

One of the prominent issues encountered in generating 3D scenes is achieving detailed geometry, particularly for regions with limited multi-view access due to camera placement or occlusion. SceneWiz3D mitigates this by integrating a diffusion model fine-tuned on panoramic RGBD images, offering depth supervision and an improved global understanding of scene structure. This advancement contributes to generating scenes with superior fidelity compared to prior approaches.

Evaluation of SceneWiz3D

SceneWiz3D's performance is extensively validated through a range of metrics assessing appearance, geometry, and overall consistency with the textual description. The method outperforms current benchmarks by not only generating view-consistent 3D scenes but also by maintaining the quality and detail of individual objects within those scenes. Furthermore, the flexibility of SceneWiz3D shines through its adaptability to diverse scene types and styles, paving the way for its application in various industries including virtual reality, gaming, and film production.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.