AANet: Adaptive Aggregation Network for Efficient Stereo Matching

Published 20 Apr 2020 in cs.CV | (2004.09548v1)

Abstract: Despite the remarkable progress made by learning based stereo matching algorithms, one key challenge remains unsolved. Current state-of-the-art stereo models are mostly based on costly 3D convolutions, the cubic computational complexity and high memory consumption make it quite expensive to deploy in real-world applications. In this paper, we aim at completely replacing the commonly used 3D convolutions to achieve fast inference speed while maintaining comparable accuracy. To this end, we first propose a sparse points based intra-scale cost aggregation method to alleviate the well-known edge-fattening issue at disparity discontinuities. Further, we approximate traditional cross-scale cost aggregation algorithm with neural network layers to handle large textureless regions. Both modules are simple, lightweight, and complementary, leading to an effective and efficient architecture for cost aggregation. With these two modules, we can not only significantly speed up existing top-performing models (e.g., $41\times$ than GC-Net, $4\times$ than PSMNet and $38\times$ than GA-Net), but also improve the performance of fast stereo models (e.g., StereoNet). We also achieve competitive results on Scene Flow and KITTI datasets while running at 62ms, demonstrating the versatility and high efficiency of the proposed method. Our full framework is available at https://github.com/haofeixu/aanet .

Abstract PDF Upgrade to Chat

Authors (2)

Citations (414)

View on Semantic Scholar

Summary

The paper introduces AANet, which eliminates costly 3D convolutions through sparse intra-scale and adaptive cross-scale aggregation.
The paper demonstrates rapid stereo matching with a 41× speed boost over GC-Net, completing tasks in just 62ms.
The paper achieves competitive accuracy on benchmarks like Scene Flow and KITTI by effectively handling low-texture areas.

An Analysis of "AANet: Adaptive Aggregation Network for Efficient Stereo Matching"

The paper "AANet: Adaptive Aggregation Network for Efficient Stereo Matching" introduces a novel framework aimed at addressing the computational challenges associated with stereo matching tasks, particularly those requiring expensive 3D convolutions. The authors propose an architecture, AANet, which offers an efficient alternative while maintaining accuracy comparable to existing state-of-the-art models.

Key Contributions

The paper makes several critical contributions to the domain of stereo matching:

Elimination of 3D Convolutions: AANet proposes to replace costly 3D convolution operations with a combination of sparse points-based intra-scale aggregation and neural network approximations of cross-scale aggregation. This transition significantly reduces computational complexity and memory demands.
Efficient Cost Aggregation: The paper details the use of a sparse points representation to manage the edge-fattening issue commonly found at disparity discontinuities. The algorithm enhances traditional aggregation methods by offering flexibility in sampling, which is particularly beneficial in textureless regions.
Adaptive Cross-Scale Aggregation: The model adapts the traditional cross-scale cost aggregation method. Multi-scale cost volumes are constructed in parallel, fostering adaptive multi-scale interaction and thus enhancing performance in low-texture areas.
Performance: AANet demonstrates competitive results on well-known datasets such as Scene Flow and KITTI, achieving rapid inference speeds (e.g., $41\times$ improvement over GC-Net). The model completes a stereo matching task in 62ms, demonstrating efficiency and effectiveness suitable for real-world deployment.

Theoretical and Practical Implications

On a theoretical level, the work provides a framework for cost aggregation that is flexible and computationally economical by effectively leveraging adaptive sampling strategies. Practical applications of AANet extend to areas requiring stereo vision, such as robot navigation, augmented reality, and autonomous vehicles, where efficiency and speed are paramount.

Strong Numerical Results

The model shows clear numerical advantages, providing a $4\times$ speedup over PSMNet and $38\times$ over GA-Net. The method also improves accuracy for fast stereo models like StereoNet, presenting a solution that balances accuracy and computational efficiency.

Future Prospects

Future research directions could explore the application of AANet's architecture to other domains beyond stereo matching, such as multi-view stereo and optical flow estimation. Additionally, its lightweight design could prove beneficial for downstream processes, such as stereo-based 3D object detection.

Conclusion

In summary, AANet challenges the conventional reliance on 3D convolutions in stereo matching models, presenting an efficient and effective approach to cost aggregation. The robust numerical results and innovative method suggest a promising shift in stereo vision modeling, with implications for both theoretical exploration and practical implementations in engineering sophisticated vision systems.

Markdown Report Issue