Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras (1708.07878v1)

Published 25 Aug 2017 in cs.CV

Abstract: We propose Stereo Direct Sparse Odometry (Stereo DSO) as a novel method for highly accurate real-time visual odometry estimation of large-scale environments from stereo cameras. It jointly optimizes for all the model parameters within the active window, including the intrinsic/extrinsic camera parameters of all keyframes and the depth values of all selected pixels. In particular, we propose a novel approach to integrate constraints from static stereo into the bundle adjustment pipeline of temporal multi-view stereo. Real-time optimization is realized by sampling pixels uniformly from image regions with sufficient intensity gradient. Fixed-baseline stereo resolves scale drift. It also reduces the sensitivities to large optical flow and to rolling shutter effect which are known shortcomings of direct image alignment methods. Quantitative evaluation demonstrates that the proposed Stereo DSO outperforms existing state-of-the-art visual odometry methods both in terms of tracking accuracy and robustness. Moreover, our method delivers a more precise metric 3D reconstruction than previous dense/semi-dense direct approaches while providing a higher reconstruction density than feature-based methods.

Authors (3)

Rui Wang (996 papers)
Martin Schwörer (1 paper)
Daniel Cremers (274 papers)

Citations (310)

View on Semantic Scholar

Summary

Stereo DSO: A New Approach to Direct Sparse Visual Odometry with Stereo Cameras

This paper presents Stereo Direct Sparse Odometry (Stereo DSO), a direct visual odometry method specifically designed for the accurate real-time estimation of motion and depth in large-scale environments using stereo cameras. The method addresses several of the limitations found in prior monocular and stereo visual odometry techniques, showcasing improvements in accuracy, robustness, and reconstruction density.

Stereo DSO is formulated within the direct sparse odometry framework but introduces a significant enhancement by leveraging stereo information. It optimizes both intrinsic and extrinsic camera parameters, as well as pixel depth values, utilizing a static stereo into the temporal multi-view stereo alignment. This integration effectively mitigates issues like scale drift, large optical flow, and the rolling shutter effect—challenges notorious in direct image alignment methods.

Contribution and Methodology

The key contribution of Stereo DSO is the effective incorporation of static stereo constraints within the bundle adjustment pipeline. The method selectively samples pixels from regions of sufficient intensity gradient, enabling real-time optimization across the active window. Such an approach reduces computational load without compromising precision, as static stereo offers accurate initial depth estimations and a fixed scale baseline. In turn, these features ensure rapid convergence and lessen dependence on photometric calibration, making the method less prone to the scale drift issue seen in monocular systems.

The paper also highlights the technical design, integrating temporal multi-view stereo constraints and static stereo constraints through a novel bundle adjustment approach while ensuring the method remains real-time capable. Marginalization using the Schur complement further supports the optimization process, allowing for a comprehensive yet efficient parameter update cycle.

Experimental Results

Extensive evaluations on the KITTI dataset demonstrate that Stereo DSO surpasses state-of-the-art visual odometry methods, such as ORB-SLAM2 and Stereo LSD-SLAM, in terms of both accuracy and robustness. The proposed method delivers lower rotational and translational errors across varying sequences compared to these established methods, particularly notable in the KITTI testing set where it shows superior performance even without loop closure or global optimization. This indicates an enhanced ability to generalize under unknown environmental conditions.

On the Cityscapes dataset, Stereo DSO performs effectively, despite challenges like rolling shutter effects and uncalibrated brightness changes. The system's ability to handle high dynamic range scenes and varying frame rates without dedicated rolling shutter calibration highlights its robustness. The results underscore its applicability in real-world, uncontrolled environments, despite some performance degradation under extreme brightness changes combined with rapid motion.

Implications and Future Prospects

This robust method implies potential applications in domains requiring accurate and real-time environmental perception, such as autonomous driving and drone navigation. Stereo DSO's ability to provide a detailed metric 3D reconstruction surpasses prior dense/semi-dense approaches, offering a more granular and pertinent analysis for practical use cases.

The paper suggests future directions, such as the integration of map maintenance and loop closure to transform Stereo DSO into a complete SLAM system. Additionally, addressing dynamic object interference could further enhance the method's robustness.

In conclusion, Stereo DSO signifies a substantial step forward in stereo-based direct visual odometry through its novel methodology and demonstrated performance improvements, offering a foundation for further research and application in complex real-world environments.

PDF Markdown