A Simple-but-effective Baseline for Training-free Class-Agnostic Counting (2403.01418v1)
Abstract: Class-Agnostic Counting (CAC) seeks to accurately count objects in a given image with only a few reference examples. While previous methods achieving this relied on additional training, recent efforts have shown that it's possible to accomplish this without training by utilizing pre-existing foundation models, particularly the Segment Anything Model (SAM), for counting via instance-level segmentation. Although promising, current training-free methods still lag behind their training-based counterparts in terms of performance. In this research, we present a straightforward training-free solution that effectively bridges this performance gap, serving as a strong baseline. The primary contribution of our work lies in the discovery of four key technologies that can enhance performance. Specifically, we suggest employing a superpixel algorithm to generate more precise initial point prompts, utilizing an image encoder with richer semantic knowledge to replace the SAM encoder for representing candidate objects, and adopting a multiscale mechanism and a transductive prototype scheme to update the representation of reference examples. By combining these four technologies, our approach achieves significant improvements over existing training-free methods and delivers performance on par with training-based ones.
- Superpixels and polygons using simple non-iterative clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4651–4660, 2017.
- Slic superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence, 34(11):2274–2282, 2012.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
- Anydoor: Zero-shot object-level image customization. arXiv preprint arXiv:2307.09481, 2023.
- Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155, 2023.
- A low-shot object counting network with iterative prototype adaptation. arXiv preprint arXiv:2211.08217, 2022.
- Multi-class segmentation with relative location prior. International journal of computer vision, 80:300–316, 2008.
- Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the IEEE international conference on computer vision, pages 4145–4153, 2017.
- Segment anything. arXiv:2304.02643, 2023.
- Semantic-sam: Segment and recognize anything at any granularity. arXiv preprint arXiv:2307.04767, 2023.
- Superpixel segmentation using linear spectral clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1356–1363, 2015.
- Countr: Transformer-based generalised visual counting. arXiv preprint arXiv:2208.13721, 2022.
- Manifold slic: A fast method to compute content-sensitive superpixels. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 651–659, 2016.
- Class-agnostic counting. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, pages 669–684. Springer, 2019.
- Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
- Can sam count anything? an empirical study on sam counting. arXiv preprint arXiv:2304.10817, 2023.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Learning to count everything. In CVPR, pages 3394–3403, 2021.
- Ren and Malik. Learning a classification model for segmentation. In Proceedings ninth IEEE international conference on computer vision, pages 10–17. IEEE, 2003.
- Anything-3d: Towards single-view anything reconstruction in the wild. arXiv preprint arXiv:2304.10261, 2023.
- Represent, compare, and learn: A similarity-aware framework for class-agnostic counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9529–9538, 2022.
- Training-free object counting with prompts. arXiv preprint arXiv:2307.00038, 2023.
- Improving an object detector and extracting regions using superpixels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3721–3727, 2013.
- Caption anything: Interactive image description with diverse multimodal controls. arXiv preprint arXiv:2305.02677, 2023.
- Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3166–3173, 2013.
- Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023.
- Class-agnostic few-shot object counting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 870–878, 2021.
- Few-shot object counting with similarity-aware feature enhancement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6315–6324, 2023.
- Yuhao Lin (10 papers)
- Haiming Xu (6 papers)
- Lingqiao Liu (114 papers)
- Javen Qinfeng Shi (34 papers)