Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 27 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 70 tok/s Pro
Kimi K2 117 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4 34 tok/s Pro
2000 character limit reached

RelPose++: Recovering 6D Poses from Sparse-view Observations (2305.04926v2)

Published 8 May 2023 in cs.CV

Abstract: We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images). This task is a vital pre-processing stage for nearly all contemporary (neural) reconstruction algorithms but remains challenging given sparse views, especially for objects with visual symmetries and texture-less surfaces. We build on the recent RelPose framework which learns a network that infers distributions over relative rotations over image pairs. We extend this approach in two key ways; first, we use attentional transformer layers to process multiple images jointly, since additional views of an object may resolve ambiguous symmetries in any given image pair (such as the handle of a mug that becomes visible in a third view). Second, we augment this network to also report camera translations by defining an appropriate coordinate system that decouples the ambiguity in rotation estimation from translation prediction. Our final system results in large improvements in 6D pose prediction over prior art on both seen and unseen object categories and also enables pose estimation and 3D reconstruction for in-the-wild objects.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations. In CVPR, 2021.
  2. NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation. In ICLR, 2021.
  3. SURF: Speeded Up Robust Features. In ECCV, 2006.
  4. Extreme Rotation Estimation using Dense Correlation Volumes. In CVPR, 2021.
  5. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM. T-RO, 2021.
  6. ShapeNet: An Information-Rich 3D Model Repository. arXiv preprint arXiv:1512.03012, 2015.
  7. Wide-Baseline Relative Camera Pose Estimation with Directional Learning. In CVPR, 2021.
  8. Universal Correspondence Network. NeurIPS, 2016.
  9. MonoSLAM: Real-time Single Camera SLAM. TPAMI, 2007.
  10. SuperPoint: Self-supervised Interest Point Detection and Description. In CVPR-W, 2018.
  11. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, 1981.
  12. Deep Orientation Uncertainty Learning Based on a Bingham Loss. In ICLR, 2019.
  13. Multiple View Geometry in Computer Vision. Cambridge University Press, 2003.
  14. Rotation Averaging. IJCV, 2013.
  15. Deep Residual Learning for Image Recognition. In CVPR, 2016.
  16. Few-View Object Reconstruction with Unknown Categories and Camera Poses. ArXiv, 2212.04492, 2022.
  17. End-to-end Recovery of Human Shape and Pose. In CVPR, 2018.
  18. Learning 3D Human Dynamics from Video. In CVPR, 2019.
  19. VIBE: Video Inference for Human Body Pose and Shape Estimation. In CVPR, 2020.
  20. BARF: Bundle-Adjusting Neural Radiance Fields. In ICCV, 2021.
  21. SIFT Flow: Dense Correspondence Across Scenes and Its Applications. TPAMI, 2010.
  22. H Christopher Longuet-Higgins. A Computer Algorithm for Reconstructing a Scene from Two Projections. Nature, 1981.
  23. David G Lowe. Distinctive Image Features from Scale-invariant Keypoints. IJCV, 2004.
  24. An Iterative Image Registration Technique with an Application to Stereo Vision. In IJCAI, 1981.
  25. MediaPipe: A Framework for Building Perception Pipelines. arXiv:1906.08172, 2019.
  26. Virtual Correspondence: Humans as a Cue for Extreme-View Geometry. In CVPR, 2022.
  27. Explaining the Ambiguity of Object Detection and 6D Pose from Visual Data. In ICCV, 2019.
  28. VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera. TOG, 2017.
  29. Relative Camera Pose Estimation Using Convolutional Neural Networks. In ACIVS, 2017.
  30. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV, 2020.
  31. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. T-RO, 2017.
  32. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. T-RO, 2015.
  33. Implicit-PDF: Non-Parametric Representation of Probability Distributions on the Rotation Manifold. In ICML, 2021.
  34. PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking. In 3DV, 2022.
  35. David Nistér. An Efficient Solution to the Five-point Relative Pose Problem. TPAMI, 2004.
  36. Learning 3D Object Categories by Looking Around Them. In ICCV, 2017.
  37. Learning Orientation Distributions for Object Pose Estimation. In IROS, 2020.
  38. ZePHyR: Zero-shot Pose Hypothesis Scoring. In ICRA, 2021.
  39. Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction. In ICCV, 2021.
  40. The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs. In 3DV, 2022.
  41. From Coarse to Fine: Robust Hierarchical Localization at Large Scale. In CVPR, 2019.
  42. SuperGlue: Learning Feature Matching with Graph Neural Networks. In CVPR, 2020.
  43. Structure-from-Motion Revisited. In CVPR, 2016.
  44. Pixelwise View Selection for Unstructured Multi-View Stereo. In ECCV, 2016.
  45. SparsePose: Sparse-View Camera Pose Regression and Refinement. In CVPR, 2023.
  46. A Benchmark for the Evaluation of RGB-D SLAM Systems. In IROS, 2012.
  47. Canonical Capsules: Self-supervised Capsules in Canonical Pose. In NeurIPS, 2021.
  48. DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. NeurIPS, 2021.
  49. Bundle Adjustment—A Modern Synthesis. In International workshop on vision algorithms, 1999.
  50. Shinji Umeyama. Least-squares Estimation of Transformation Parameters Between Two Point Patterns. TPAMI, 1991.
  51. MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision. In CVPR, 2022.
  52. Attention is All You Need. NeurIPS, 2017.
  53. PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment. In ICCV, 2023.
  54. DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks. In ICRA, 2017.
  55. SegICP: Integrated Deep Semantic Segmentation and Pose Estimation. In IROS, 2017.
  56. Pose from Shape: Deep Pose Estimation for Arbitrary 3D Objects. In BMVC, 2019.
  57. PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild with Pose-Aware Contrastive Learning. In 3DV, 2021.
  58. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry. In CVPR, 2020.
  59. pixelNeRF: Neural Radiance Fields from One or Few Images. In CVPR, 2021.
  60. Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild. In ECCV, 2020.
  61. NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild. In NeurIPS, 2021.
  62. RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild. In ECCV, 2022.
  63. Richard Zhang. Making Convolutional Networks Shift-Invariant Again. In ICML, 2019.
  64. Stereo magnification: Learning view synthesis using multiplane images. SIGGRAPH, 37, 2018.
  65. SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction. In CVPR, 2023.
Citations (41)

Summary

  • The paper introduces a transformer-based multi-view framework that accurately estimates 6D camera poses from as few as 2 images.
  • The method decouples rotation and translation tasks using a novel world coordinate system, achieving a 10% improvement in rotation accuracy and enhanced translation predictions.
  • Evaluations demonstrate significant performance gains over methods like COLMAP, enabling high-fidelity sparse-view 3D reconstructions in real-world scenarios.

RelPose++: Sparse-View 6D Pose Estimation

Introduction

In this exploration, we analyze the capabilities of RelPose++, a robust framework for recovering 6D camera poses from sparse-view image sets ranging from 2 to 8 images. RelPose++ builds on the recent RelPose framework, addressing its limitations by utilizing transformer-based modules to incorporate multi-view cues and extending capabilities to predict camera translations. This paper demonstrates significant improvements in pose accuracy, benefiting various downstream applications such as 3D reconstruction.

Methodology

Multi-View Rotation and Translation

RelPose++ extends the RelPose framework by introducing a transformer-based module for processing multiple images simultaneously. This multi-view integration allows the system to resolve rotational ambiguities inherent in pairs of images, notably improving estimation accuracy for images with symmetric features, such as a mug with an obscured handle (Figure 1). Figure 1

Figure 1: Overview of RelPose++. We present RelPose++, a method for sparse-view camera pose estimation. RelPose++ starts by extracting global image features using a ResNet 50.

The framework predicts rotations using an energy-based model similar to RelPose, but augments this with a consistent set of global rotations extracted through maximum spanning tree methods and coordinate ascent optimization.

Translation Prediction

A central extension in RelPose++ is its capacity to predict transient camera translations. It defines a world coordinate centered at the intersection of optical axes, a method that decouples rotation and translation tasks (Figure 2). Figure 2

Figure 2: Coordinate Systems for Estimating Camera Translation. This helps decouple the task of predicting camera translations from rotations.

This approach circumvents the limitations of using the first camera as the frame origin, thereby stabilizing predictions even in cases of symmetric ambiguities.

Evaluation

Quantitative Results

RelPose++ demonstrates significant improvements in 6D pose prediction over alternatives like COLMAP and PoseDiffusion, particularly in scenarios with object symmetry or limited visual features (Table 1). It consistently outperforms these methods, verifying its rotation and translation prediction capabilities through rigorous benchmarking.

  • Rotation Accuracy: The method shows a 10%10\% improvement over previous art concerning unseen categories and handles rotations with 1515^\circ accuracy effectively.
  • Translation Accuracy: Using the look-at centered coordinate system, RelPose++ predicts translations that are measurably more accurate, validated by an optimal similarity transform application.

Qualitative Analysis

Qualitative results indicate RelPose++'s ability to generalize to real-world, in-the-wild captures, such as self-photographed scenes (Figure 3). The capability to initialize sparse-view 3D reconstructions with high fidelity further underscores its utility in practical applications (Figure 4). Figure 3

Figure 3: Recovered Camera Poses from In-the-Wild Images.

Figure 4

Figure 4: Sparse-view 3D Reconstruction using NeRS.

Discussion

RelPose++ provides a robust mechanism for recovering sparse-view camera poses, showcasing impressive generalization and accuracy. While the methodology currently focuses on offline processing, it offers insights into potential real-time applications when combined with technique refinements. Its model could benefit from further integration into existing 3D modeling pipelines, driving improvements in fields demanding precise spatial awareness.

Conclusion

The advancements in RelPose++ exemplify a crucial step towards comprehensive sparse-view pose estimation, introducing a method that effectively separates rotational and translational estimations for improved accuracy. Future avenues include deploying these strategies in dynamic environments and extending capabilities through fusion with real-time systems, promising widespread implications across robotics, AR/VR, and beyond.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.