Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Simple-but-effective Baseline for Training-free Class-Agnostic Counting (2403.01418v1)

Published 3 Mar 2024 in cs.CV

Abstract: Class-Agnostic Counting (CAC) seeks to accurately count objects in a given image with only a few reference examples. While previous methods achieving this relied on additional training, recent efforts have shown that it's possible to accomplish this without training by utilizing pre-existing foundation models, particularly the Segment Anything Model (SAM), for counting via instance-level segmentation. Although promising, current training-free methods still lag behind their training-based counterparts in terms of performance. In this research, we present a straightforward training-free solution that effectively bridges this performance gap, serving as a strong baseline. The primary contribution of our work lies in the discovery of four key technologies that can enhance performance. Specifically, we suggest employing a superpixel algorithm to generate more precise initial point prompts, utilizing an image encoder with richer semantic knowledge to replace the SAM encoder for representing candidate objects, and adopting a multiscale mechanism and a transductive prototype scheme to update the representation of reference examples. By combining these four technologies, our approach achieves significant improvements over existing training-free methods and delivers performance on par with training-based ones.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Superpixels and polygons using simple non-iterative clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4651–4660, 2017.
  2. Slic superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence, 34(11):2274–2282, 2012.
  3. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  4. Anydoor: Zero-shot object-level image customization. arXiv preprint arXiv:2307.09481, 2023.
  5. Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155, 2023.
  6. A low-shot object counting network with iterative prototype adaptation. arXiv preprint arXiv:2211.08217, 2022.
  7. Multi-class segmentation with relative location prior. International journal of computer vision, 80:300–316, 2008.
  8. Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the IEEE international conference on computer vision, pages 4145–4153, 2017.
  9. Segment anything. arXiv:2304.02643, 2023.
  10. Semantic-sam: Segment and recognize anything at any granularity. arXiv preprint arXiv:2307.04767, 2023.
  11. Superpixel segmentation using linear spectral clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1356–1363, 2015.
  12. Countr: Transformer-based generalised visual counting. arXiv preprint arXiv:2208.13721, 2022.
  13. Manifold slic: A fast method to compute content-sensitive superpixels. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 651–659, 2016.
  14. Class-agnostic counting. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, pages 669–684. Springer, 2019.
  15. Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
  16. Can sam count anything? an empirical study on sam counting. arXiv preprint arXiv:2304.10817, 2023.
  17. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  18. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  19. Learning to count everything. In CVPR, pages 3394–3403, 2021.
  20. Ren and Malik. Learning a classification model for segmentation. In Proceedings ninth IEEE international conference on computer vision, pages 10–17. IEEE, 2003.
  21. Anything-3d: Towards single-view anything reconstruction in the wild. arXiv preprint arXiv:2304.10261, 2023.
  22. Represent, compare, and learn: A similarity-aware framework for class-agnostic counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9529–9538, 2022.
  23. Training-free object counting with prompts. arXiv preprint arXiv:2307.00038, 2023.
  24. Improving an object detector and extracting regions using superpixels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3721–3727, 2013.
  25. Caption anything: Interactive image description with diverse multimodal controls. arXiv preprint arXiv:2305.02677, 2023.
  26. Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3166–3173, 2013.
  27. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023.
  28. Class-agnostic few-shot object counting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 870–878, 2021.
  29. Few-shot object counting with similarity-aware feature enhancement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6315–6324, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yuhao Lin (10 papers)
  2. Haiming Xu (6 papers)
  3. Lingqiao Liu (114 papers)
  4. Javen Qinfeng Shi (34 papers)

Summary

We haven't generated a summary for this paper yet.