From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis (2406.19934v2)

Published 28 Jun 2024 in cs.CL and cs.AI

Abstract: We explore multi-step reasoning in vision-LLMs (VLMs). The problem is challenging, as reasoning data consisting of multiple steps of visual and language processing are barely available. To overcome the challenge, we first introduce a least-to-most visual reasoning paradigm, which interleaves steps of decomposing a question into sub-questions and invoking external tools for resolving sub-questions. Based on the paradigm, we further propose a novel data synthesis approach that can automatically create questions and multi-step reasoning paths for an image in a bottom-up manner. Our approach divides the complex synthesis task into a few simple sub-tasks, and (almost entirely) relies on open-sourced models to accomplish the sub-tasks. Therefore, the entire synthesis process is reproducible and cost-efficient, and the synthesized data is quality guaranteed. With the approach, we construct $50$k visual reasoning examples. Then, we develop a visual reasoner through supervised fine-tuning, which is capable of generally enhancing the reasoning abilities of a wide range of existing VLMs in a plug-and-play fashion. Extensive experiments indicate that the visual reasoner can consistently and significantly improve four VLMs on four VQA benchmarks. Our code and dataset are available at https://github.com/steven-ccq/VisualReasoner.

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - steven-ccq/VisualReasoner: Official repository for paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis" (9 stars)

Tweets

https://twitter.com/m2saxon/status/1856418452094775423

https://twitter.com/gm8xx8/status/1807595888698609815

https://twitter.com/WuWei0318/status/1846089922345927131

From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis (2406.19934v2)

Summary

Related Papers

GitHub

Tweets