Stereo DSO: A New Approach to Direct Sparse Visual Odometry with Stereo Cameras
This paper presents Stereo Direct Sparse Odometry (Stereo DSO), a direct visual odometry method specifically designed for the accurate real-time estimation of motion and depth in large-scale environments using stereo cameras. The method addresses several of the limitations found in prior monocular and stereo visual odometry techniques, showcasing improvements in accuracy, robustness, and reconstruction density.
Stereo DSO is formulated within the direct sparse odometry framework but introduces a significant enhancement by leveraging stereo information. It optimizes both intrinsic and extrinsic camera parameters, as well as pixel depth values, utilizing a static stereo into the temporal multi-view stereo alignment. This integration effectively mitigates issues like scale drift, large optical flow, and the rolling shutter effect—challenges notorious in direct image alignment methods.
Contribution and Methodology
The key contribution of Stereo DSO is the effective incorporation of static stereo constraints within the bundle adjustment pipeline. The method selectively samples pixels from regions of sufficient intensity gradient, enabling real-time optimization across the active window. Such an approach reduces computational load without compromising precision, as static stereo offers accurate initial depth estimations and a fixed scale baseline. In turn, these features ensure rapid convergence and lessen dependence on photometric calibration, making the method less prone to the scale drift issue seen in monocular systems.
The paper also highlights the technical design, integrating temporal multi-view stereo constraints and static stereo constraints through a novel bundle adjustment approach while ensuring the method remains real-time capable. Marginalization using the Schur complement further supports the optimization process, allowing for a comprehensive yet efficient parameter update cycle.
Experimental Results
Extensive evaluations on the KITTI dataset demonstrate that Stereo DSO surpasses state-of-the-art visual odometry methods, such as ORB-SLAM2 and Stereo LSD-SLAM, in terms of both accuracy and robustness. The proposed method delivers lower rotational and translational errors across varying sequences compared to these established methods, particularly notable in the KITTI testing set where it shows superior performance even without loop closure or global optimization. This indicates an enhanced ability to generalize under unknown environmental conditions.
On the Cityscapes dataset, Stereo DSO performs effectively, despite challenges like rolling shutter effects and uncalibrated brightness changes. The system's ability to handle high dynamic range scenes and varying frame rates without dedicated rolling shutter calibration highlights its robustness. The results underscore its applicability in real-world, uncontrolled environments, despite some performance degradation under extreme brightness changes combined with rapid motion.
Implications and Future Prospects
This robust method implies potential applications in domains requiring accurate and real-time environmental perception, such as autonomous driving and drone navigation. Stereo DSO's ability to provide a detailed metric 3D reconstruction surpasses prior dense/semi-dense approaches, offering a more granular and pertinent analysis for practical use cases.
The paper suggests future directions, such as the integration of map maintenance and loop closure to transform Stereo DSO into a complete SLAM system. Additionally, addressing dynamic object interference could further enhance the method's robustness.
In conclusion, Stereo DSO signifies a substantial step forward in stereo-based direct visual odometry through its novel methodology and demonstrated performance improvements, offering a foundation for further research and application in complex real-world environments.