Emergent Mind

NeRFiller: Completing Scenes via Generative 3D Inpainting

(2312.04560)
Published Dec 7, 2023 in cs.CV , cs.AI , and cs.GR

Abstract

We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpainting problem by leveraging a 2D inpainting diffusion model. We identify a surprising behavior of these models, where they generate more 3D consistent inpaints when images form a 2$\times$2 grid, and show how to generalize this behavior to more than four images. We then present an iterative framework to distill these inpainted regions into a single consistent 3D scene. In contrast to related works, we focus on completing scenes rather than deleting foreground objects, and our approach does not require tight 2D object masks or text. We compare our approach to relevant baselines adapted to our setting on a variety of scenes, where NeRFiller creates the most 3D consistent and plausible scene completions. Our project page is at https://ethanweber.me/nerfiller.

Overview

  • NeRFiller addresses incomplete 3D scene data by introducing a generative 3D inpainting method that employs 2D image inpainting models for scene completion.

  • Traditional 3D scanning can leave unobserved regions, and NeRFiller fills these areas with increased multi-view consistency without requiring object masks or textual prompts.

  • A new technique, Joint Multi-View Inpainting, is used, allowing for improved consistency across multiple views during the scene completion process.

  • Comparisons show that NeRFiller completes scenes with greater coherence and plausibility than existing methods and allows user-guided inpainting using reference images.

  • NeRFiller faces challenges in generating high-resolution details at a distance and applying to casual captures, indicating potential for future enhancements.

Overview of Generative 3D Inpainting

The emergence of 3D scene capture technology has accelerated the creation of immersive worlds but often suffers from incomplete data due to occlusions or missing observations. Bridging these gaps in 3D environments is crucial for applications ranging from virtual reality to film production. A novel approach, NeRFiller (Neural Radiance Filler), addresses the challenge by introducing a generative 3D inpainting strategy that utilizes existing 2D image inpainting models to effectively complete three-dimensional scenes.

The Shortcomings in Capturing Complete 3D Scenes

3D scanning, while sophisticated, frequently results in scenes with unobserved regions or undesired elements. Editing these 3D captures to fill in or modify content requires consistency across multiple views - a task that proves difficult when using models oriented toward 2D image generation which lacks inherent 3D understanding.

NeRFiller's Innovative Approach

NeRFiller leverages the capabilities of 2D inpainting diffusion models, uncovering their propensity to produce more consistent three-dimensional inpaints when multiple images are arranged in a specific grid pattern. This discovery is harnessed in a new technique, Joint Multi-View Inpainting, which allows more than four images to be inpainted with increased multi-view consistency. In an iterative process, these 2D inpaints are distilled into a cohesive 3D scene representation, resulting in plausible and 3D-consistent scene completions.

The innovation does not require tight object masks or textual prompts, relying on scene context alone. It stands apart from baseline methods that focus on either generating new scenes from scratch or removing objects, offering a targeted remedy for scenes with partial data.

Implementation and Results

NeRFiller's effectiveness is demonstrated through comparisons with existing techniques across a variety of scenes. The approach has shown promising results in completing scenes more coherently and plausibly than competitors. An aspect of NeRFiller enables user control over the inpainting process by using reference images to guide the outcome.

Limitations and Future Directions

Despite substantial progress, NeRFiller is challenged by creating high-resolution details in regions far from observation points, and applying the method to casual captures currently poses difficulties due to the out-of-distribution mask patterns for existing inpainting models. These areas present opportunities for future work.

Conclusion

NeRFiller takes significant strides in the realm of 3D content generation. By providing a method for skilled scene completion that is conditioned on multi-view images, it unlocks new potentials for the refinement of 3D captures, paving the way toward more seamless and intricate virtual environments.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.