Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

All You May Need for VQA are Image Captions (2205.01883v1)

Published 4 May 2022 in cs.CV and cs.CL

Abstract: Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation. In this paper, we propose a method that automatically derives VQA examples at volume, by leveraging the abundance of existing image-caption annotations combined with neural models for textual question generation. We show that the resulting data is of high-quality. VQA models trained on our data improve state-of-the-art zero-shot accuracy by double digits and achieve a level of robustness that lacks in the same model trained on human-annotated VQA data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Soravit Changpinyo (24 papers)
  2. Doron Kukliansky (3 papers)
  3. Idan Szpektor (47 papers)
  4. Xi Chen (1040 papers)
  5. Nan Ding (57 papers)
  6. Radu Soricut (54 papers)
Citations (63)

Summary

We haven't generated a summary for this paper yet.