Emergent Mind

Global Structure-from-Motion Revisited

(2407.20219)
Published Jul 29, 2024 in cs.CV

Abstract

Recovering 3D structure and camera motion from images has been a long-standing focus of computer vision research and is known as Structure-from-Motion (SfM). Solutions to this problem are categorized into incremental and global approaches. Until now, the most popular systems follow the incremental paradigm due to its superior accuracy and robustness, while global approaches are drastically more scalable and efficient. With this work, we revisit the problem of global SfM and propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM. In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM, while being orders of magnitude faster. We share our system as an open-source implementation at {https://github.com/colmap/glomap}.

GLOMAP system pipeline integrating translation averaging and triangulation into a single global positioning step.

Overview

  • The paper presents GLOMAP, a new system for Structure-from-Motion that enhances accuracy, robustness, and scalability compared to existing methods.

  • GLOMAP bypasses the limitations of traditional global SfM methods by innovatively integrating camera and point position estimations, avoiding global translation averaging.

  • Extensive evaluations demonstrate that GLOMAP outperforms other systems, including COLMAP, in terms of recall, accuracy, and computational efficiency across various datasets.

Global Structure-from-Motion Revisited: An Analytical Overview

The paper "Global Structure-from-Motion Revisited" by Linfei Pan, Dániel Baráth, Marc Pollefeys, and Johannes L. Schönberger presents a significant exploration into the domain of Structure-from-Motion (SfM) with the introduction of a new system termed GLOMAP. This system proposes advancements in the global SfM approach, aiming to augment accuracy, robustness, and scalability.

Background and Introduction

Structure-from-Motion (SfM) is a fundamental problem in computer vision, focused on recovering 3D structures and camera motions from a collection of images. The problem is traditionally tackled using either incremental or global approaches. Incremental SfM, despite its accuracy and robustness, is hindered by scalability due to repeated bundle adjustments. In contrast, global SfM approaches perform camera pose estimations in a more scalable manner but often struggle with accuracy and robustness.

The authors revisit the global SfM paradigm and present GLOMAP, a system achieving comparability and even surpassing the accuracy and robustness of state-of-the-art incremental methods such as COLMAP, while maintaining superior scalability and efficiency.

Key Contributions and Methodology

The paper identifies global translation averaging as the primary limitation in existing global SfM methods, causing accuracy and robustness gaps when compared to incremental approaches. The main contribution of GLOMAP is its innovative global positioning step, which jointly estimates camera and point positions, bypassing the ill-posed translation averaging step altogether.

The GLOMAP system pipeline involves the following:

  1. Feature Track Construction: Initial feature correspondences are investigated using two-view geometry verification, filtering outliers rigorously.
  2. Global Positioning of Cameras and Points: This step integrates camera position and 3D point estimation within a single global optimization framework, leveraging normalized direction differences to ensure robustness and reliable convergence.
  3. Global Bundle Adjustment: Further refinement of camera rotations, intrinsics, and 3D points are carried out.
  4. Camera Clustering: Post-processing step to handle non-overlapping image sets through camera covisibility graph analysis, ensuring coherent global reconstructions.

Results and Performance

The authors perform extensive evaluations on multiple datasets, ranging from calibrated to uncalibrated and from unordered to sequential image collections.

  • On the ETH3D SLAM dataset, GLOMAP achieves about an 8% higher recall compared to COLMAP and scores significantly better in AUC at the 0.1m and 0.5m thresholds.
  • For the ETH3D MVS (rig) dataset, GLOMAP shows consistent reconstruction success, outperforming other global SfM systems and achieving comparable accuracy to COLMAP but with significantly reduced computational costs.
  • The LaMAR dataset evaluations underscore GLOMAP’s robustness, delivering remarkably accurate reconstructions for large-scale indoor and outdoor scenes, outperforming both global SfM baselines and COLMAP in this challenging benchmark.
  • On uncalibrated datasets like IMC 2023 and MIP360, GLOMAP demonstrates substantial improvements in both accuracy and efficiency over other baselines, emphasizing its ability to handle unknown camera intrinsics effectively.

Implications and Future Directions

The implications of this research are significant for both practical applications and theoretical advancements in computer vision. Practically, the accelerated performance and scalability of GLOMAP make it highly suitable for real-world 3D reconstruction tasks, including large-scale mapping and novel-view synthesis. Theoretically, the novel joint optimization approach paves the way for further explorations into the integration of camera and point estimation in global SfM frameworks.

Future developments might investigate hybrid approaches that combine the best aspects of incremental and global methods, tailoring to specific scenarios like scenes with high symmetry or severe occlusions. Additionally, integration with learning-based approaches could further improve robustness against challenging visual conditions and anomalies in datasets.

Conclusion

This paper introduces GLOMAP, a groundbreaking revisit to global SfM, providing a robust, accurate, and efficient solution that bridges the gap between traditional incremental and global approaches. The system's comprehensive evaluations validate its superior performance, marking a pivotal advancement in the field of Structure-from-Motion. By open-sourcing the implementation, the authors facilitate further research and application, driving future innovations in the domain.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.