Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 126 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate (2407.20505v1)

Published 30 Jul 2024 in cs.CV

Abstract: MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination. Previous methods focus on determining whether a generated output is hallucinated, without identifying which image region leads to the hallucination or interpreting why such hallucinations occur. In this paper, we argue that hallucination in MLLMs is partially due to a lack of slow-thinking and divergent-thinking in these models. To address this, we propose adopting a self-reflection scheme to promote slow-thinking. Furthermore, we consider eliminating hallucination as a complex reasoning task and propose a multi-agent debate approach to encourage divergent-thinking. Consequently, our approach can not only mitigate hallucinations but also interpret why they occur and detail the specifics of hallucination. In addition, we propose to distinguish creativity from hallucination in the context of MLLMs, and illustrate how to evaluate MLLMs' creativity capability. Extensive experiments on various benchmarks demonstrate that our approach exhibits generalized hallucinations-mitigating performance across several MLLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Hallucination of multimodal large language models: A survey, 2024.
  2. Chateval: Towards better llm-based evaluators through multi-agent debate, 2023.
  3. Autoagents: A framework for automatic agent generation, 2024.
  4. Navigate through enigmatic labyrinth a survey of chain of thought reasoning: Advances, frontiers and future, 2024.
  5. Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
  6. Detecting and preventing hallucinations in large vision language models, 2024.
  7. Measuring massive multitask language understanding, 2021.
  8. Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6700–6709, 2019.
  9. Hal-eval: A universal and fine-grained hallucination evaluation framework for large vision language models, 2024.
  10. A survey on large language model hallucination via a creativity perspective, 2024.
  11. Camel: Communicative agents for ”mind” exploration of large language model society, 2023.
  12. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, 2023.
  13. Evaluating object hallucination in large vision-language models, 2023.
  14. Evaluating object hallucination in large vision-language models. arXiv preprint arXiv:2305.10355, 2023.
  15. Encouraging divergent thinking in large language models through multi-agent debate, 2024.
  16. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  17. Improved baselines with visual instruction tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26296–26306, 2024.
  18. Asynergistic core for human brain evolution and cognition. Nature Neuroscience, 25(6):771–782, 2022.
  19. Self-refine: Iterative refinement with self-feedback, 2023.
  20. Reasoning with language model prompting: A survey, 2023.
  21. A-okvqa: A benchmark for visual question answering using world knowledge. In European conference on computer vision, pages 146–162. Springer, 2022.
  22. Reflexion: Language agents with verbal reinforcement learning, 2023.
  23. Aligning large multimodal models with factually augmented rlhf, 2023.
  24. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  25. Vigc: Visual instruction generation and correction, 2024.
  26. Chain-of-thought prompting elicits reasoning in large language models, 2023.
  27. Evidence for a collective intelligence factor in the performance of human groups. science, 330(6004):686–688, 2010.
  28. Tree of thoughts: Deliberate problem solving with large language models, 2023.
  29. Hallucidoctor: Mitigating hallucinatory toxicity in visual instruction data, 2024.
  30. Benchmarking large language models for news summarization, 2023.
  31. Assessing and understanding creativity in large language models, 2024.
  32. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube