Wasserstein Distances for Stereo Disparity Estimation (2007.03085v2)

Published 6 Jul 2020 in cs.CV and cs.LG

Abstract: Existing approaches to depth or disparity estimation output a distribution over a set of pre-defined discrete values. This leads to inaccurate results when the true depth or disparity does not match any of these values. The fact that this distribution is usually learned indirectly through a regression loss causes further problems in ambiguous regions around object boundaries. We address these issues using a new neural network architecture that is capable of outputting arbitrary depth values, and a new loss function that is derived from the Wasserstein distance between the true and the predicted distributions. We validate our approach on a variety of tasks, including stereo disparity and depth estimation, and the downstream 3D object detection. Our approach drastically reduces the error in ambiguous regions, especially around object boundaries that greatly affect the localization of objects in 3D, achieving the state-of-the-art in 3D object detection for autonomous driving. Our code will be available at https://github.com/Div99/W-Stereo-Disp.

Authors (6)

Divyansh Garg (12 papers)
Yan Wang (733 papers)
Bharath Hariharan (82 papers)
Mark Campbell (52 papers)
Kilian Q. Weinberger (105 papers)
Wei-Lun Chao (92 papers)

Citations (69)

View on Semantic Scholar

Summary

The paper presents a novel neural network architecture that predicts continuous disparity values to improve depth estimation.
It utilizes a Wasserstein distance-based loss function to align predicted depth distributions with ground truth, especially at object boundaries.
Experimental results on KITTI and Scene Flow benchmarks demonstrate significant error reduction and enhanced 3D object detection performance.

An Evaluation of Wasserstein Distances for Stereo Disparity Estimation

This paper presents an innovative approach to stereo disparity estimation, tackling the challenges associated with predicting depth from stereo images. The primary contribution lies in enhancing the precision of depth estimates, particularly around object boundaries, which is critical for applications such as 3D object detection in autonomous driving.

Key Contributions

The authors address the inherent limitations of existing stereo disparity estimation techniques that predict a distribution over a set of pre-defined discrete values. This conventional method often results in inaccuracies, especially in ambiguous regions where true depth or disparity doesn't align perfectly with any discrete value. The paper proposes:

Neural Network Architecture: A novel neural network model capable of outputting arbitrary depth values, overcoming the constraints of integral disparity values by predicting real-valued offsets in addition to probabilities for each disparity value in a discrete set. This allows the model to predict a continuous distribution over disparity values, improving accuracy, particularly in regions around object boundaries.
Loss Function Based on Wasserstein Distance: A new loss function derived from the Wasserstein distance is employed to measure the divergence between the predicted and actual distributions. This enhances learning, as it aligns more closely with the ground truth, even in multi-modal contexts where multiple values might be valid for a single pixel.
Demonstrable Performance Improvements: The proposed approach significantly reduces the error at object boundaries, enhances 3D localization accuracy, and achieves state-of-the-art results in 3D object detection for autonomous driving on benchmark datasets like KITTI and Scene Flow.

Experimental Validation

The authors validate their approach using several stereo networks, including PSMNet and GANet, across tasks such as stereo disparity estimation, depth estimation, and 3D object detection. The experiments demonstrate substantial improvements in reducing endpoint errors and pixel threshold errors, with notable gains around foreground objects and boundaries. The results show a consistent advantage of the proposed continuous disparity network (CDN) over baselines on metrics like End-Point-Error (EPE), Root Mean Square Error (RMSE), and Absolute Relative Error (ABSR).

Discussion on Theoretical and Practical Implications

The use of Wasserstein distance as a loss function is particularly fitting for scenarios in stereo disparity estimation where ground truth values are non-overlapping or multi-modal. This choice establishes a methodologically sound framework for evaluating distributions, providing both stability and efficiency in training.

The proposed method also offers broader implications. It can be easily integrated into existing stereo evaluation pipelines, suggesting a path forward for enhancing numerous applications, from autonomous navigation to augmented reality, where depth perception is crucial.

Future Directions and Speculations

The paper hints at the potential for further enhancements through multi-task learning approaches that could incorporate other streams of data, such as segmentation outputs. Moreover, incorporating uncertainty estimations in depth and disparity predictions could yield richer and more robust models suitable for dynamic environments where safety and precision are paramount.

Overall, the authors provide a significant step toward more accurate and reliable stereo disparity estimation, laying the groundwork for future developments in the field of computer vision and autonomous systems. The proposal to use Wasserstein distances opens up avenues for leveraging similar methods in other domain-specific applications involving distribution matching and optimization.

PDF Markdown

Related Papers

GitHub

GitHub - Div99/W-Stereo-Disp: (NeurIPS 2020 Spotlight) Wasserstein Distances for Stereo Disparity Estimation (102 stars)

YouTube

Show All Videos