Neighbourhood Consensus Networks (1810.10510v2)

Published 24 Oct 2018 in cs.CV and cs.LG

Abstract: We address the problem of finding reliable dense correspondences between a pair of images. This is a challenging task due to strong appearance differences between the corresponding scene elements and ambiguities generated by repetitive patterns. The contributions of this work are threefold. First, inspired by the classic idea of disambiguating feature matches using semi-local constraints, we develop an end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model. Second, we demonstrate that the model can be trained effectively from weak supervision in the form of matching and non-matching image pairs without the need for costly manual annotation of point to point correspondences. Third, we show the proposed neighbourhood consensus network can be applied to a range of matching tasks including both category- and instance-level matching, obtaining the state-of-the-art results on the PF Pascal dataset and the InLoc indoor visual localization benchmark.

Citations (364)

View on Semantic Scholar

Summary

The paper introduces a novel CNN that identifies spatially consistent dense image correspondences using a 4D search space.
The paper leverages weak supervision to eliminate the need for extensive manual point-to-point annotations.
The paper achieves state-of-the-art results on PF-Pascal and InLoc benchmarks, proving its versatility in computer vision tasks.

Overview of Neighbourhood Consensus Networks

The paper, "Neighbourhood Consensus Networks," presents a novel approach to the problem of finding reliable dense correspondences between pairs of images. This task is crucial for various applications, including 3D reconstruction, visual localization, and object recognition. Given the challenges posed by significant appearance differences and ambiguities induced by repetitive patterns, the authors introduce a method that leverages semi-local constraints through an end-to-end trainable convolutional neural network (CNN) architecture.

Key Contributions

This work advances the state of the art in image correspondence by offering three pivotal contributions:

Neural Architecture for Neighbourhood Consensus: The authors propose a CNN model that identifies spatially consistent feature matches by analyzing patterns in a 4D space of all possible correspondences between image pairs. Notably, this approach circumvents the need for a global geometric model, thus offering a robust alternative to traditional methods that rely on handcrafted point descriptors followed by geometric constraint filtering.
Weak Supervision: The proposed model can be effectively trained using weak supervision. By utilizing image pairs labeled only as matching or non-matching, the approach bypasses the necessity for costly manual annotation of point-to-point correspondences, which has been a significant hurdle in training similar systems.
Versatility Across Tasks: Demonstrating the model's versatility, the authors apply it to a range of matching tasks at both category- and instance-levels, achieving state-of-the-art results on benchmark datasets such as PF-Pascal for semantic object matching and InLoc for indoor visual localization.

Numerical Results and Claims

The numerical results underline the robustness and effectiveness of the proposed method:

The approach achieves a PCK (percentage of correct keypoints) of 78.9% on the PF-Pascal dataset, surpassing prior state-of-the-art methods by approximately 3%.
For indoor visual localization, the model significantly improves localization accuracy, outperforming existing baseline approaches. For example, the proposed InLoc+NC-Net configuration correctly localizes a higher percentage of queries within a 2-meter distance compared to the baseline methods.

The paper asserts that the neural architecture effectively learns local geometric constraints directly from data, thus outperforming traditional methods reliant on manually engineered criteria.

Implications and Future Developments

The implications of this research are significant for both theoretical advancements and practical applications within computer vision. The approach challenges existing paradigms by integrating neighborhood consensus directly into a trainable network architecture, rather than as a post-processing step. This integration could propel further developments in end-to-end learning frameworks for complex visual correspondence tasks.

Moreover, the paper opens avenues for future research to explore the generalization of neighborhood consensus concepts across different domains such as temporal sequence alignment and cross-domain object matching. Given the weak supervision framework's efficacy, future studies could further refine this technique to reduce reliance on curated datasets, potentially improving model applicability across diverse and unstructured datasets.

In conclusion, the "Neighbourhood Consensus Networks" paper presents a significant stride forward by effectively combining classical semi-local constraints with modern deep learning architectures to enhance image matching resilience and accuracy. The methodology not only stands as an important contribution to visual correspondence literature but also sets the stage for widespread adoption in practical computer vision applications.

PDF Markdown

Related Papers

YouTube

Show All Videos