Learning Spatial Similarity Distribution for Few-shot Object Counting (2405.11770v1)
Abstract: Few-shot object counting aims to count the number of objects in a query image that belong to the same class as the given exemplar images. Existing methods compute the similarity between the query image and exemplars in the 2D spatial domain and perform regression to obtain the counting number. However, these methods overlook the rich information about the spatial distribution of similarity on the exemplar images, leading to significant impact on matching accuracy. To address this issue, we propose a network learning Spatial Similarity Distribution (SSD) for few-shot object counting, which preserves the spatial structure of exemplar features and calculates a 4D similarity pyramid point-to-point between the query features and exemplar features, capturing the complete distribution information for each point in the 4D similarity space. We propose a Similarity Learning Module (SLM) which applies the efficient center-pivot 4D convolutions on the similarity pyramid to map different similarity distributions to distinct predicted density values, thereby obtaining accurate count. Furthermore, we also introduce a Feature Cross Enhancement (FCE) module that enhances query and exemplar features mutually to improve the accuracy of feature matching. Our approach outperforms state-of-the-art methods on multiple datasets, including FSC-147 and CARPK. Code is available at https://github.com/CBalance/SSD.
- Localization in the crowd with topological constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 872–881, 2021.
- Counting in the wild. In Proceedings of the European Conference on Computer Vision, pages 483–498, 2016.
- Rethinking spatial invariance of convolutional networks for object counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19638–19648, 2022.
- Domain-general crowd counting in unseen scenarios. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 561–570, 2023.
- Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, pages 1126–1135, 2017.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Mask r-cnn. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017.
- Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4145–4153, 2017.
- Crowdclip: Unsupervised crowd counting via vision-language model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2893–2903, 2023.
- Optimal transport minimization: Crowd localization on density maps for semi-supervised counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21663–21673, 2023.
- Object counting: You only need to look at one. arXiv preprint arXiv:2112.05993, 2021.
- Scale-prior deformable convolution for exemplar-guided class-agnostic counting. In Proceedings of the British Machine Vision Conference, 2022.
- Countr: Transformer-based generalised visual counting. In Proceedings of the British Machine Vision Conference, 2022.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Class-agnostic counting. In Proceedings of the Asian Conference on Computer Vision, pages 669–684, 2019.
- Bayesian loss for crowd count estimation with point supervision. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6142–6151, 2019.
- Hypercorrelation squeeze for few-shot segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6941–6952, 2021.
- Learning to count everything. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3394–3403, 2021.
- Neighbourhood consensus networks. Advances in neural information processing systems, 31, 2018.
- Represent, compare, and learn: A similarity-aware framework for class-agnostic counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9529–9538, 2022.
- Crowd counting in the frequency domain. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19618–19627, 2022.
- End-to-end people detection in crowded scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2325–2333, 2016.
- A low-shot object counting network with iterative prototype adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18872–18881, 2023.
- A generalized loss function for crowd counting and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1974–1983, 2021.
- Nwpu-crowd: A large-scale benchmark for crowd counting and localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(6):2141–2149, 2020.
- Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
- Volumetric correspondence networks for optical flow. Advances in neural information processing systems, 32, 2019.
- Class-agnostic few-shot object counting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 870–878, 2021.
- Few-shot object counting with similarity-aware feature enhancement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6315–6324, 2023.