Emergent Mind

COLMAP-Free 3D Gaussian Splatting

(2312.07504)
Published Dec 12, 2023 in cs.CV

Abstract

While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes. Our project page is https://oasisyang.github.io/colmap-free-3dgs

Overview

  • The paper introduces the concept of 3D Gaussian Splatting as a solution to the challenges in novel view synthesis and 3D scene reconstruction.

  • It presents a new method titled CF-3DGS, which eschews the need for SfM pre-processing and processes input video frames sequentially.

  • By employing dual-local-global optimization, CF-3DGS achieves improved scene reconstruction and camera pose predictions.

  • CF-3DGS demonstrates robust pose estimation and higher quality view synthesis, even in cases of extensive camera motion like 360-degree video recordings.

  • The method promises significant implications for the creation of realistic virtual experiences in various applications.

Introduction to Novel View Synthesis

The evolving domain of photo-realistic scene reconstruction and novel view synthesis has seen remarkable progress, particularly with the advent of Neural Radiance Fields (NeRF). These developments hinge on upfront computation of camera poses, traditionally derived from Structure-from-Motion (SfM) techniques, such as those provided by the COLMAP library. However, pre-computed camera poses can create bottlenecks and limitations.

Challenges and Innovations

NeRF's implicit scene representations create inherent constraints when simultaneously determining 3D scene structure and camera posing. For instance, methods like 'Nope-NeRF' may face difficulties when camera poses alter substantially, a common case in 360-degree video recordings. Integrating camera pose estimation within the NeRF framework has long been a complex task, akin to the proverbial chicken-and-egg problem.

Enter the method of 3D Gaussian Splatting. It offers a new perspective with its explicit point cloud representations, opening a window for bypassing pre-processed camera pose estimations. Recognizing this potential, a new approach has been proposed—COLMAP-Free 3D Gaussian Splatting (CF-3DGS). This method harnesses the explicit geometric nature of video sequences and the continuity of video frame passages to perform innovative view synthesis without any SfM pre-processing.

CF-3DGS Methodology

CF-3DGS operates by processing input video frames sequentially, allowing the set of 3D Gaussian representations to 'grow' as the camera navigates through space. Each incoming frame leads to an updated local 3D Gaussian set, which is then correlated to a global representation of the scene. This dual-local-global optimization, working with both current and previous frames, contributes to significantly better scene reconstruction and camera pose predictions when benchmarked against other non-SfM methods.

Advantages and Results

The proposed CF-3DGS method sets itself apart by achieving robustness in pose estimation and higher quality in novel view synthesis across various scenes. Not confined to small shifts in camera motion, its performance excels notably when challenged with wide-ranging camera movements, particularly in scenarios of 360-degree video captures. Moreover, it brings efficiency to the process, achieving results on par with state-of-the-art methods like 'Nope-NeRF' with notably more expedient training times.

Future Implications

In the pursuit of mirroring and manipulating reality through technology, methods like CF-3DGS edge us closer to seamless and realistic virtual experiences. Whether for entertainment, simulation, or education, the impacts of such advances open unexplored doors to how we might interact with and visualize our surroundings through the lens of artificial intelligence.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube