MC-Stereo: Multi-peak Lookup and Cascade Search Range for Stereo Matching (2311.02340v2)
Abstract: Stereo matching is a fundamental task in scene comprehension. In recent years, the method based on iterative optimization has shown promise in stereo matching. However, the current iteration framework employs a single-peak lookup, which struggles to handle the multi-peak problem effectively. Additionally, the fixed search range used during the iteration process limits the final convergence effects. To address these issues, we present a novel iterative optimization architecture called MC-Stereo. This architecture mitigates the multi-peak distribution problem in matching through the multi-peak lookup strategy, and integrates the coarse-to-fine concept into the iterative framework via the cascade search range. Furthermore, given that feature representation learning is crucial for successful learn-based stereo matching, we introduce a pre-trained network to serve as the feature extractor, enhancing the front end of the stereo matching pipeline. Based on these improvements, MC-Stereo ranks first among all publicly available methods on the KITTI-2012 and KITTI-2015 benchmarks, and also achieves state-of-the-art performance on ETH3D. Code is available at https://github.com/MiaoJieF/MC-Stereo.
- Correlate-and-excite: Real-time stereo matching via guided cost volume excitation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3542–3548. IEEE, 2021.
- Instereo2k: a large real dataset for stereo matching in indoor scenes. Science China Information Sciences, 63:1–11, 2020.
- Matching-space stereo networks for cross-domain generalization. In 2020 International Conference on 3D Vision (3DV), pages 364–373. IEEE, 2020.
- Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5410–5418, 2018.
- Region separable stereo matching. IEEE Transactions on Multimedia, 2022.
- Coatrsnet: Fully exploiting convolution and attention for stereo matching by region separation. International Journal of Computer Vision, pages 1–18, 2023.
- Hierarchical neural architecture search for deep stereo matching. Advances in Neural Information Processing Systems, 33:22158–22169, 2020.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
- Group-wise correlation stereo network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3273–3282, 2019.
- Heiko Hirschmuller. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on pattern analysis and machine intelligence, 30(2):328–341, 2007.
- Learning to estimate hidden motions with global motion aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9772–9781, 2021.
- End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision, pages 66–75, 2017.
- Practical stereo matching via cascaded recurrent network with adaptive correlation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16263–16272, 2022.
- Raft-stereo: Multilevel recurrent field transforms for stereo matching. In 2021 International Conference on 3D Vision (3DV), pages 218–227. IEEE, 2021.
- Local similarity pattern and cost self-reassembling for deep stereo matching networks. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1647–1655, 2022a.
- Graftnet: Towards domain generalized stereo matching with a broad-spectrum and task-oriented feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13012–13021, 2022b.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022c.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Uasnet: Uncertainty adaptive sampling network for deep stereo matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6311–6319, 2021.
- A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4040–4048, 2016.
- Joint 3d estimation of vehicles and scene flow. ISPRS annals of the photogrammetry, remote sensing and spatial information sciences, 2:427, 2015.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3260–3269, 2017.
- Sgm-nets: Semi-global matching with neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 231–240, 2017.
- Cfnet: Cascade and fused cost volume for robust stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13906–13915, 2021.
- Pcw-net: Pyramid combination and warping cost volume for stereo matching. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, pages 280–297. Springer, 2022.
- Stereo matching using belief propagation. IEEE Transactions on pattern analysis and machine intelligence, 25(7):787–800, 2003.
- Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14362–14372, 2021.
- Raft: Recurrent all-pairs field transforms for optical flow. In European conference on computer vision, pages 402–419. Springer, 2020.
- Attention concatenation volume for accurate and efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12981–12990, 2022.
- Iterative geometry encoding volume for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21919–21928, 2023a.
- Accurate and efficient stereo matching via attention concatenation volume. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023b.
- Cgi-stereo: Accurate and real-time stereo matching via context and geometry interaction. arXiv preprint arXiv:2301.02789, 2023c.
- Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1959–1968, 2020.
- Non-parametric depth distribution modelling based depth inference for multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8626–8634, 2022.
- Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res., 17(1):2287–2318, 2016.
- Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 185–194, 2019.
- Domain-invariant stereo matching networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 420–439. Springer, 2020.
- Revisiting domain generalized stereo matching networks from a feature consistency perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13001–13011, 2022.
- Cross-based local stereo matching using orthogonal integral images. IEEE transactions on circuits and systems for video technology, 19(7):1073–1079, 2009.