Emergent Mind

ReconFusion: 3D Reconstruction with Diffusion Priors

(2312.02981)
Published Dec 5, 2023 in cs.CV

Abstract

3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at rendering photorealistic novel views of complex scenes. However, recovering a high-quality NeRF typically requires tens to hundreds of input images, resulting in a time-consuming capture process. We present ReconFusion to reconstruct real-world scenes using only a few photos. Our approach leverages a diffusion prior for novel view synthesis, trained on synthetic and multiview datasets, which regularizes a NeRF-based 3D reconstruction pipeline at novel camera poses beyond those captured by the set of input images. Our method synthesizes realistic geometry and texture in underconstrained regions while preserving the appearance of observed regions. We perform an extensive evaluation across various real-world datasets, including forward-facing and 360-degree scenes, demonstrating significant performance improvements over previous few-view NeRF reconstruction approaches.

Diffusion model ablation on 3-view reconstruction demonstrates consistency in NeRF reconstructions despite sample randomness.

Overview

  • ReconFusion utilizes a diffusion model to aid in the 3D reconstruction of scenes from a minimal number of 2D images.

  • The diffusion model acts as an image prior, filling in likely appearances of unseen parts of a scene to improve reconstruction.

  • The technique effectively generates realistic geometries and textures for underconstrained areas, maintaining the quality of better-observed regions.

  • ReconFusion outperforms other NeRF-based methods, particularly in situations with limited views.

  • This method reduces reliance on extensive image datasets and mitigates common artifacts in 3D model reconstructions.

In the field of computer vision, creating 3D models from a collection of 2D images is a complex task that often requires a large number of images to achieve photo-realistic results. This is particularly true for Neural Radiance Fields (NeRF), a technique that excels at rendering highly realistic novel views of complex scenes. Unfortunately, capturing such a large number of images to cover every angle of a scene can be impractical and time-consuming.

A novel approach, termed ReconFusion, addresses this challenge by enabling the reconstruction of real-world scenes using as few as just a handful of photos. The key innovation lies in leveraging a diffusion model, a type of generative model known for producing high-quality images, to guide the reconstruction process. The diffusion model, trained on synthetic and multi-view datasets, functions as an image prior. This means it can estimate what unseen parts of the scene might look like, given a few observed views, and use this information to regularize the 3D reconstruction pipeline.

ReconFusion's process synthesizes realistic geometry and textures in regions of the scene that are underconstrained (i.e., have been observed from too few angles), while preserving the fidelity of the parts that have been captured from multiple perspectives. This technique has been rigorously tested on diverse datasets, including those that provide forward-facing or 360-degree views. It significantly outperforms existing NeRF-based methods for scenarios with minimal views.

Interestingly, ReconFusion not only helps when the number of available views is exceedingly low but can also enhance quality and reduce common artifacts known as "floaters" in scenarios where there are a significant number of observations. It serves as a drop-in regularizer for NeRF, applicable to a variety of capture situations, helping to make 3D model reconstruction more accessible and less reliant on dense image captures.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube