Emergent Mind

Abstract

In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editing prompt is nontrivial since there exist two significant challenges, including accurate editing of only foreground regions and multi-view consistency given a single-view reference image. To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing, aimed at foreground-only manipulation while preserving the background. For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem among different views in image-driven editing. Extensive experiments show that our CustomNeRF produces precise editing results under various real scenes for both text- and image-driven settings.

Overview

  • Neural Radiance Fields (NeRF) have enabled the creation of photo-realistic 3D scenes by replicating complex lighting behavior.

  • The paper presents a method for 3D scene editing that allows users to modify foreground objects while preserving the background.

  • Local-Global Iterative Editing (LGIE) is proposed to concentrate edits on the foreground and ensure scene-wide consistency.

  • Class-guided regularization using Text-to-Image (T2I) models maintains geometric consistency across different viewing angles.

  • CustomNeRF, the proposed model, demonstrates precise, realistic editing in real scenes for both text- and image-driven inputs.

Overview of Neural Radiance Fields and Scene Editing

Neural Radiance Fields (NeRF) have become a critical tool for creating realistic 3D scenes that can be viewed from any angle. This technology uses neural networks to reproduce the complex behaviors of light within a scene, allowing for photorealistic renderings of virtual environments. Advances in NeRF have spurred research into 3D scene editing, where objects within a scene can be textured, styled, or replaced to suit different needs. However, editing 3D scenes directly with approaches like NeRF can be challenging due to the need for accurate manipulation of specific areas, known as foreground regions, and ensuring consistency across different viewpoints.

Adaptive Source Driven 3D Scene Editing

The paper introduces a solution for customized 3D scene editing by incorporating adaptive source input, either in the form of text descriptions or reference images. This allows for the modification of a scene's foreground while keeping its background unchanged, tackling a common difficulty in prior work where changes could inadvertently affect untargeted parts of the scene.

Local-Global Iterative Editing

To overcome the challenge of concentrating edits on the foreground, the authors propose a Local-Global Iterative Editing (LGIE) training scheme. In this scheme, the editing alternates between local stages, focusing on the foreground, and global stages, taking into account the entire scene. This process is facilitated by developing a foreground-aware NeRF that can discern which parts of the scene should be edited. By adjusting the training process to focus on the foreground or the whole scene as needed, the method manages to preserve the original layout and background details.

Class-guided Regularization for Image-driven Editing

Another challenge arises when editing is guided by a single-view reference image, which can lead to inconsistencies when rendering from different perspectives. The authors address this with a class-guided regularization technique, using a Text-to-Image (T2I) model to encode the visual subject from the reference image into a textual prompt. During the editing process, this allows general class priors from the T2I model to guide geometric consistency across views.

Results and Conclusions

The model, named CustomNeRF, is shown to produce precise editing results in various real scenes for both text- and image-driven settings. Extensive experiments reveal that CustomNeRF can effectively modify the specified regions in a photo-realistic manner, demonstrating the potential of LGIE and class-guided regularization in 3D scene editing. The model's contributions offer a significant step in enabling users to customize scenes according to their specific needs or preferences, broadening the accessibility and flexibility of NeRF-based editing tools.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.