Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 168 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 130 tok/s Pro
Kimi K2 170 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders (2309.01141v4)

Published 3 Sep 2023 in cs.CV

Abstract: Large-scale text-to-image diffusion models have shown impressive capabilities for generative tasks by leveraging strong vision-language alignment from pre-training. However, most vision-language discriminative tasks require extensive fine-tuning on carefully-labeled datasets to acquire such alignment, with great cost in time and computing resources. In this work, we explore directly applying a pre-trained generative diffusion model to the challenging discriminative task of visual grounding without any fine-tuning and additional training dataset. Specifically, we propose VGDiffZero, a simple yet effective zero-shot visual grounding framework based on text-to-image diffusion models. We also design a comprehensive region-scoring method considering both global and local contexts of each isolated proposal. Extensive experiments on RefCOCO, RefCOCO+, and RefCOCOg show that VGDiffZero achieves strong performance on zero-shot visual grounding. Our code is available at https://github.com/xuyang-liu16/VGDiffZero.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. “Learning transferable visual models from natural language supervision,” in ICML, 2021.
  2. OpenAI, “GPT-4 technical report,” 2023.
  3. “High-resolution image synthesis with latent diffusion models,” in CVPR, 2022.
  4. “LAION-5B: An open large-scale dataset for training next generation image-text models,” arXiv preprint arXiv:2210.08402, 2022.
  5. “Imagic: Text-based real image editing with diffusion models,” in CVPR, 2023.
  6. “SmartBrush: Text and shape guided object inpainting with diffusion model,” in CVPR, 2023.
  7. “VinVL: Revisiting visual representations in vision-language models,” in CVPR, 2021.
  8. “Dap: Domain-aware prompt learning for vision-and-language navigation,” arXiv preprint arXiv:2311.17812, 2023.
  9. “Modeling context in referring expressions,” in ECCV, 2016.
  10. “MAttNet: Modular attention network for referring expression comprehension,” in CVPR, 2018.
  11. “Language adaptive weight generation for multi-task visual grounding,” in CVPR, 2023.
  12. “DQ-DETR: Dual query detection transformer for phrase extraction and grounding,” in AAAI, 2023.
  13. “Zero-shot referring image segmentation with global-local context features,” in CVPR, 2023.
  14. “Your diffusion model is secretly a zero-shot classifier,” in ICCV, 2023.
  15. “Unleashing text-to-image diffusion models for visual perception,” in ICCV, 2023.
  16. “Diffusion models for zero-shot open-vocabulary segmentation,” arXiv preprint arXiv:2306.09316, 2023.
  17. “Discriminative diffusion models as few-shot vision and language learners,” arXiv preprint arXiv:2305.10722, 2023.
  18. “Generation and comprehension of unambiguous object descriptions,” in CVPR, 2016.
  19. “Deep unsupervised learning using nonequilibrium thermodynamics,” in ICML, 2015.
  20. “Denoising diffusion probabilistic models,” in NeurIPS, 2020.
  21. “U-Net: Convolutional networks for biomedical image segmentation,” in MICCAI, 2015.
  22. “CPT: Colorful prompt tuning for pre-trained vision-language models,” arXiv preprint arXiv:2109.11797, 2021.
  23. “Faster R-CNN: Towards real-time object detection with region proposal networks,” in NeurIPS, 2015.
  24. “Microsoft COCO: Common objects in context,” in ECCV, 2014.
  25. “An improved non-monotonic transition system for dependency parsing,” in EMNLP, 2015.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.