EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching (1803.05196v3)

Published 14 Mar 2018 in cs.CV

Abstract: Recent convolutional neural networks, especially end-to-end disparity estimation models, achieve remarkable performance on stereo matching task. However, existed methods, even with the complicated cascade structure, may fail in the regions of non-textures, boundaries and tiny details. Focus on these problems, we propose a multi-task network EdgeStereo that is composed of a backbone disparity network and an edge sub-network. Given a binocular image pair, our model enables end-to-end prediction of both disparity map and edge map. Basically, we design a context pyramid to encode multi-scale context information in disparity branch, followed by a compact residual pyramid for cascaded refinement. To further preserve subtle details, our EdgeStereo model integrates edge cues by feature embedding and edge-aware smoothness loss regularization. Comparative results demonstrates that stereo matching and edge detection can help each other in the unified model. Furthermore, our method achieves state-of-art performance on both KITTI Stereo and Scene Flow benchmarks, which proves the effectiveness of our design.

Authors (4)

Xiao Song (18 papers)
Xu Zhao (64 papers)
Hanwen Hu (3 papers)
Liangji Fang (12 papers)

Citations (189)

View on Semantic Scholar

Summary

The paper introduces a novel integrated multi-task network that unifies stereo matching and edge detection to enhance disparity estimation in challenging regions.
It employs a context pyramid and residual refinement technique to capture multi-scale information and address issues in non-textured and boundary areas.
The model achieves state-of-the-art performance with lower pixel error rates on benchmarks like KITTI and Scene Flow compared to PSMNet and GC-Net.

EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching

EdgeStereo is a research endeavor that introduces an innovative method for stereo matching, aiming to address the limitations of existing models in handling regions with non-textured surfaces, boundaries, and minute details. The authors present a novel architecture that combines stereo matching and edge detection into a unified multi-task model; this is enabled through their development of the EdgeStereo network, which integrates a backbone disparity network with an edge sub-network.

Network Design

The EdgeStereo model is structured with two main components: a disparity estimation network named Context Pyramid-based Residual Pyramid Network (CP-RPN) and an edge detection sub-network. The CP-RPN utilizes a context pyramid to capture multi-scale context information, refining the output through a residual pyramid for efficient disparity prediction. The use of these pyramids allows the model to encode contextual information at various scales effectively, enhancing its ability to manage challenging regions within images. The edge sub-network serves to enrich the detail preservation capabilities of the disparity branch by embedding edge features and employing an edge-aware smoothness loss regularization. This collaboration between stereo matching and edge detection tasks provides mutual benefits and is substantiated by strong comparative results on benchmarks.

Performance and Results

The effectiveness of EdgeStereo is demonstrated through its performance on the KITTI Stereo and Scene Flow datasets. The model achieves state-of-the-art results, showcasing improved accuracy and detail retention in disparity maps. The paper reports the percentage of erroneous pixels across various contexts and edge-aware metrics, highlighting advancements over competing methods such as PSMNet and GC-Net. Notably, EdgeStereo achieves lower error rates on challenging test sets, with improvements that are statistically significant.

Implications and Future Directions

The paper sheds light on the potential implications of combining stereo matching with edge detection into a unified multi-task network. The integration of edge cues not only improves disparity estimation but also enhances edge map prediction, suggesting that such multi-task strategies could be beneficial across different domains within computer vision. The authors further propose a multi-phase training strategy, allowing effective training of the network without requiring multi-task labels during the initial stages.

Looking to the future, the EdgeStereo model offers promising directions for the development of more sophisticated vision systems. Its architecture can inspire enhancements in related tasks such as optical flow and semantic segmentation by adopting similar multi-task approaches and context pyramid designs. Further research may explore the extension of edge-aware techniques to other forms of regularization and loss functions, aiming to refine the consistency and robustness of model predictions.

In conclusion, EdgeStereo represents a significant advancement in stereo matching through its contextual and edge integrated approach, achieving notable improvements in handling complex image regions and enhancing both theoretical comprehension and practical application outcomes in AI and computer vision.

PDF Markdown