Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 60 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 427 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting (2312.16580v2)

Published 27 Dec 2023 in cs.CV

Abstract: Zero-Shot Object Counting (ZSOC) aims to count referred instances of arbitrary classes in a query image without human-annotated exemplars. To deal with ZSOC, preceding studies proposed a two-stage pipeline: discovering exemplars and counting. However, there remains a challenge of vulnerability to error propagation of the sequentially designed two-stage process. In this work, an one-stage baseline, Visual-Language Baseline (VLBase), exploring the implicit association of the semantic-patch embeddings of CLIP is proposed. Subsequently, the extension of VLBase to Visual-language Counter (VLCounter) is achieved by incorporating three modules devised to tailor VLBase for object counting. First, Semantic-conditioned Prompt Tuning (SPT) is introduced within the image encoder to acquire target-highlighted representations. Second, Learnable Affine Transformation (LAT) is employed to translate the semantic-patch similarity map to be appropriate for the counting task. Lastly, the layer-wisely encoded features are transferred to the decoder through Segment-aware Skip Connection (SaSC) to keep the generalization capability for unseen classes. Through extensive experiments on FSC147, CARPK, and PUCPR+, the benefits of the end-to-end framework, VLCounter, are demonstrated.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Localization in the crowd with topological constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 872–881.
  2. Counting in the wild. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, 483–498. Springer.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  4. Privacy preserving crowd monitoring: Counting people without people models or tracking. In 2008 IEEE conference on computer vision and pattern recognition, 1–7. IEEE.
  5. Counting everyday objects in everyday scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1135–1144.
  6. Object counting and instance segmentation with image-level supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12397–12405.
  7. Class-agnostic object counting robust to intraclass diversity. In European Conference on Computer Vision, 388–403. Springer.
  8. Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332.
  9. Ranking info noise contrastive estimation: Boosting contrastive learning via ranked positives. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 897–905.
  10. Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the IEEE international conference on computer vision, 4145–4153.
  11. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European conference on computer vision (ECCV), 532–546.
  12. Visual prompt tuning. In European Conference on Computer Vision, 709–727. Springer.
  13. CLIP-Count: Towards Text-Guided Zero-Shot Object Counting. arXiv preprint arXiv:2305.07304.
  14. Where are the blobs: Counting by localization with point supervision. In Proceedings of the european conference on computer vision (ECCV), 547–562.
  15. Language-driven Semantic Segmentation. In International Conference on Learning Representations.
  16. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
  17. Clip surgery for better explainability with enhancement in open-vocabulary tasks. arXiv preprint arXiv:2304.05653.
  18. Density map regression guided detection network for rgb-d crowd counting and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1821–1830.
  19. Context-aware crowd counting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5099–5108.
  20. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  21. Class-agnostic counting. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, 669–684. Springer.
  22. Image segmentation using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7086–7096.
  23. Query-dependent video representation for moment retrieval and highlight detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 23023–23033.
  24. A large contextual dataset for classification, detection and counting of cars with deep learning. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, 785–800. Springer.
  25. Zero-shot temporal action detection via vision-language prompting. In European Conference on Computer Vision, 681–697. Springer.
  26. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
  27. Exemplar free class agnostic counting. In Proceedings of the Asian Conference on Computer Vision, 3121–3137.
  28. Learning to count everything. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3394–3403.
  29. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18082–18091.
  30. Represent, compare, and learn: A similarity-aware framework for class-agnostic counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9529–9538.
  31. Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1221–1231.
  32. Distribution matching for crowd counting. Advances in neural information processing systems, 33: 1595–1607.
  33. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 139–149.
  34. Microscopy cell counting and detection with fully convolutional regression networks. Computer methods in biomechanics and biomedical engineering: Imaging & Visualization, 6(3): 283–292.
  35. Zero-shot Object Counting. arXiv preprint arXiv:2303.02001.
  36. Class-agnostic few-shot object counting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 870–878.
  37. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 589–597.
  38. Extract free dense labels from clip. In European Conference on Computer Vision, 696–712. Springer.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.