- The paper introduces a sparse CNN framework that reduces computational demands over 10-fold while maintaining high matching performance.
- It employs a two-stage correspondence relocalisation mechanism to refine match coordinates from grid-level estimations to sub-pixel accuracy.
- The method achieves superior results on benchmarks like HPatches and InLoc, promising advances for real-time visual localization and 3D reconstruction.
 
 
      Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
The paper presents Sparse-NCNet, an innovative approach to image matching that addresses key limitations of the Neighbourhood Consensus Networks (NCNet) by incorporating submanifold sparse convolutions. The focus lies on improving memory efficiency and reducing the inference time while enhancing matching accuracy. The primary contribution is the significant reduction in computational demands through the sparsity of correlation tensors and optimization of processing via a 4D convolutional neural network (CNN) that employs submanifold sparse convolutions. This results in more than a 10-fold reduction in memory footprint and execution time without compromising performance. Furthermore, the introduction of a two-stage correspondence relocalisation mechanism enhances the localisation precision of matches.
Methodology
Sparse-NCNet operates by selectively retaining only the most promising matches within the correlation tensor, which are efficiently processed using a sparse CNN framework. The use of submanifold sparse convolutions ensures that the sparsity of the data is preserved, avoiding unnecessary computational complexity. Additionally, the correlation tensor is enhanced with a permutation-invariant CNN, improving robustness by effectively propagating information within local neighborhoods.
To address the challenge of poorly localized correspondences, Sparse-NCNet implements a novel relocalisation mechanism. It begins with a hard relocalisation step that refines match coordinates through a regional optimization of match likelihood within a quadrupled grid resolution. This is followed by a soft relocalisation step leveraging softargmax to achieve sub-pixel accuracy, enhancing the practical applicability of the matches in high-precision tasks such as visual localisation and 3D reconstruction.
Results
The effectiveness of Sparse-NCNet is validated across several benchmarks, namely HPatches Sequences, InLoc, and Aachen Day-Night, demonstrating superior or comparable performance to previous methods. Sparse-NCNet outperforms the state-of-the-art in the HPatches Sequences benchmark, particularly excelling in addressing both viewpoint and illumination variances. The significant improvements in the computational efficiency make real-time applications more feasible. On the InLoc benchmark for indoor localisation, Sparse-NCNet sets a new record for accuracy, reaffirming that the combined feature extraction, matching, and filtering from a single pipeline provides robust solutions. On the Aachen Day-Night benchmark, Sparse-NCNet achieves results on par with domain-leading techniques, navigating the challenging task of day-night imagery localization effectively.
Implications and Future Work
Sparse-NCNet represents a substantial leap forward in leveraging sparse representations within CNN architectures for image matching, offering an advantageous balance between computational efficiency and matching performance. This approach not only satisfies current demands for real-time processing in resource-constrained environments but also opens up new possibilities for large-scale 3D reconstruction and real-time navigation systems.
Future research directions involve exploring the integration of these sparse convolutional networks in other domains such as video processing and extending the model to accommodate multispectral data for improved robustness across diverse environmental conditions. Additionally, potential developments could involve further enhancements in the relocalisation strategy, potentially integrating more advanced interpolation techniques to overcome any residual localization limitations.