Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer (2404.14351v2)

Published 22 Apr 2024 in cs.CV

Abstract: We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of a visual relocalizer, that is, of a method that registers new views to the current state of the reconstruction. This perspective allows us to investigate alternative visual relocalizers that are not rooted in local feature matching. We show that scene coordinate regression, a learning-based relocalization approach, allows us to build implicit, neural scene representations from unposed images. Different from other learning-based reconstruction methods, we do not require pose priors nor sequential inputs, and we optimize efficiently over thousands of images. In many cases, our method, ACE0, estimates camera poses with an accuracy close to feature-based SfM, as demonstrated by novel view synthesis. Project page: https://nianticlabs.github.io/acezero/

Citations (9)

View on Semantic Scholar

Summary

The paper introduces the SCR framework that uses incremental scene coordinate regression to achieve fast and efficient camera pose estimation.
It utilizes an adapted ACE relocalizer (ACE0) with self-supervision to train directly on unposed images, reducing computational overhead.
The method scales to process up to 10,000 images per hour on a single GPU, demonstrating robust performance across varied scenes.

Incremental Scene Reconstruction from Unposed Images Leveraging Neural Scene Representations

Introduction to Scene Coordinate Reconstruction

Recent advancements in scene reconstruction typically rely on feature-based structure-from-motion (SfM) tools which incrementally build a spatial model by triangulating sparse 3D points and registering new camera views to this developing model. These tools, although effective, are rooted deeply in local feature matching, demanding computationally intensive image-to-image correspondence searches. The paper presents an alternative approach by reinterpreting incremental SfM as a loop of visual relocalization—a method for registering new views using a continually refined model. The authors propose leveraging scene coordinate regression, a learning-based relocalization approach, as a core mechanism for an alternative scene reconstruction paradigm called Scene Coordinate Reconstruction (SCR). Unlike conventional methods, SCR does not require pose priors or sequentially ordered inputs and is efficient across large image sets.

Key Contributions

SCR Framework: Introduces an SfM based on the incremental learning of scene coordinate regression, diverging from traditional feature matching by regressing direct image-to-scene correspondences.
ACE0: An adapted version of the ACE relocalizer tailored to predict camera poses from unposed RGB images efficiently. It facilitates swift relocalizer training and integrates a self-supervised learning approach for direct application in SfM.
Efficiency and Self-supervision: The method starts with a single image and iteratively refines the relocalizer and scene model, demonstrating noteworthy efficiency enhancements (e.g., processing 10,000 images in about an hour on a single GPU).

Technical Overview and Implementation Details

The process of SCR is twofold—neural mapping and relocalization. The neural mapping phase involves training a scene coordinate regressor using previously registered images as pseudo ground truth. This training is optimized for speed using the pre-trained ACE model, allowing rapid refinement of the scene model across iterations. During the relocalization phase, the updated scene model is used to estimate poses for additional images, incrementally building the dataset for successive mapping phases. This iterative process effectively handles thousands of images without requiring prior knowledge of camera poses.

The model initialization starts with a single image for which the pose is set as the identity matrix. A depth estimate enables the generation of initial scene coordinates, bootstrapping the iterative SCR pipeline efficiently. Subsequent improvements are contingent upon successfully relocalizing a sufficient number of new images based on confidence scores derived from the inlier count in the RANSAC algorithm.

Analysis and Implications

The evaluation of ACE0 across different benchmarks, including indoor and outdoor scenes, highlights its ability to achieve competitive pose estimation accuracy with traditional methods like COLMAP and RealityCapture, albeit with significantly reduced computational overhead. Notably, the method adeptly handles large, unstructured datasets, demonstrating resilience against common challenges such as varied scene depth and absence of initial pose estimation.

Future Directions: Potential research could extend this framework's applicability to more dynamic environments and integrate more robust error-handling mechanisms during the relocalization phase. Additionally, exploring the integration of explicit feature matching as a fallback or hybrid approach could further enhance the model's adaptability and accuracy in complex scenes.

Conclusion

The presented SCR framework and its implementation through ACE0 signify a substantial shift toward learning-based scene reconstruction methodologies. By leveraging incremental learning and efficient neural representations, the method not only simplifies the traditional complexities associated with SfM but also enhances scalability and speed, paving the way for more adaptive and robust scene reconstruction tools in the future.

PDF Markdown

Related Papers

Tweets

https://twitter.com/eric_brachmann/status/1782637300901458102

https://twitter.com/_akhaliq/status/1782609437573763324

https://twitter.com/zhenjun_zhao/status/1782673542934634529

https://twitter.com/ducha_aiki/status/1784941154800136469

https://twitter.com/fly51fly/status/1782889715311526260

https://twitter.com/_vztu/status/1817982658774974769