Papers
Topics
Authors
Recent
2000 character limit reached

Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination (2403.14401v2)

Published 21 Mar 2024 in cs.CV

Abstract: Multi-modal LLMs (MLLMs) demonstrate remarkable success across various vision-language tasks. However, they suffer from visual hallucination, where the generated responses diverge from the provided image. Are MLLMs oblivious to the accurate visual cues when they hallucinate? Our investigation reveals that the visual branch may equally advocate both accurate and erroneous content. To address this issue, we propose Pensieve, a training-free method that leverages the analogous visual hallucinations, which are induced by images sharing common semantic and appearance characteristics, to mitigate hallucination. Specifically, Pensieve enables MLLMs to retrospect relevant images as references and compare their visual content with the test image via confidence score subtraction. Moreover, our paradigm balances the effects of addressing errors from both the visual and textual branches by adaptively scaling the subtracted scores. Experiments on Whoops, LLaVA Bench, POPE, and MME demonstrate the efficacy of Pensieve in mitigating visual hallucination, surpassing other advanced decoding strategies. Pensieve also aids MLLMs in identifying visual details and enhance the specificity of generated image descriptions.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.