Papers
Topics
Authors
Recent
2000 character limit reached

Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry (1807.02570v2)

Published 6 Jul 2018 in cs.CV

Abstract: Monocular visual odometry approaches that purely rely on geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction. In this paper, we propose to leverage deep monocular depth prediction to overcome limitations of geometry-based monocular visual odometry. To this end, we incorporate deep depth predictions into Direct Sparse Odometry (DSO) as direct virtual stereo measurements. For depth prediction, we design a novel deep network that refines predicted depth from a single image in a two-stage process. We train our network in a semi-supervised way on photoconsistency in stereo images and on consistency with accurate sparse depth reconstructions from Stereo DSO. Our deep predictions excel state-of-the-art approaches for monocular depth on the KITTI benchmark. Moreover, our Deep Virtual Stereo Odometry clearly exceeds previous monocular and deep learning based methods in accuracy. It even achieves comparable performance to the state-of-the-art stereo methods, while only relying on a single camera.

Citations (324)

Summary

  • The paper presents DVSO, which integrates deep depth prediction into Direct Sparse Odometry to significantly reduce scale drift in monocular visual odometry.
  • It employs a novel StackNet architecture with SimpleNet and ResidualNet to refine disparity predictions for improved depth accuracy.
  • Experimental results on the KITTI dataset show that DVSO achieves stereo-level accuracy while outperforming traditional monocular methods.

Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry

The paper "Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry" (DVSO) explores advanced techniques for monocular visual odometry in environments such as autonomous navigation and augmented reality. The authors incorporate deep learning-based monocular depth prediction into traditional geometric approaches, addressing issues such as scale drift and motion parallax.

Introduction

Traditional monocular visual odometry (VO) methods suffer from inherent scale drift due to their reliance on geometric cues. While stereo methods can resolve these limitations, they require calibration and additional equipment, making them less desirable for some applications. DVSO proposes a solution that leverages deep monocular depth prediction, employing a neural network to provide depth estimates and virtual stereo measurements for initialization in Direct Sparse Odometry (DSO). Figure 1

Figure 1: DVSO achieves monocular visual odometry on KITTI on par with state-of-the-art stereo methods. It uses deep-learning based left-right disparity predictions (lower left) for initialization and virtual stereo constraints in an optimization-based direct visual odometry pipeline. This allows for recovering accurate metric estimates.

Network Architecture and Training

The novel network architecture, StackNet, refines depth predictions in two stages using fully convolutional encoder-decoder subnetworks: SimpleNet and ResidualNet.

  • SimpleNet: Utilizes a ResNet-50 based encoder and skip connections for high-resolution disparity map predictions.
  • ResidualNet: Refines disparity estimates from SimpleNet using an additive residual signal.

The training process is semi-supervised, leveraging photometric consistency and sparse depth information from Stereo DSO for enhanced depth prediction accuracy. Figure 2

Figure 2: Overview of StackNet architecture.

Deep Virtual Stereo Odometry Implementation

DVSO integrates deep monocular depth predictions into the monocular DSO pipeline, initializing depth maps and formulating virtual stereo image alignment constraints. This optimization includes a novel virtual stereo term that enhances bundle adjustment accuracy by aligning estimated depth with predicted disparities. Figure 3

Figure 3: System overview of DVSO. Every new frame is used for visual odometry and fed into the proposed StackNet to predict left and right disparity. The predicted left and right disparities are used for depth initialization, while the right disparity is used to form the virtual stereo term in direct sparse bundle adjustment.

Experimental Results

DVSO was tested extensively on the KITTI dataset. The implementation showed significant improvements over traditional monocular methods, achieving results comparable to stereo VO systems. Post-tuning, DVSO also demonstrated superior results against deep learning end-to-end VO systems, affirming its practical utility in monocular setups.

  • Monocular VO: DVSO reduces scale drift significantly, outperforming state-of-the-art methods on benchmarks.
  • Monocular Depth Prediction: StackNet yields improved depth estimation compared to both supervised and self-supervised state-of-the-art methods. Figure 4

Figure 4

Figure 4

Figure 4: Qualitative comparison with state-of-the-art methods. The ground truth is interpolated for better visualization. Our approach shows better prediction on thin structures than the self-supervised approach~\cite{godard2016unsupervised}.

Conclusion

DVSO enhances monocular visual odometry by integrating deep depth predictions into a geometric framework, effectively tackling scale drift and improving accuracy. This approach delivers significant advancements in the reliability and efficacy of monocular camera systems for real-world applications. Future directions may include further network fine-tuning within the odometry pipeline to adapt to various environments and camera setups. The method's applicability across different domains underscores its potential as a robust solution for monocular navigation and mapping tasks. Figure 5

Figure 5

Figure 5: Qualitative results on Eigen et al.'s KITTI Raw test split. The result of Godard et al.~\cite{godard2016unsupervised} highlights superior prediction quality compared to other methods.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.