The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models (2401.03205v1)

Published 6 Jan 2024 in cs.CL

Abstract: In the era of LLMs, hallucination (i.e., the tendency to generate factually incorrect content) poses great challenge to trustworthy and reliable deployment of LLMs in real-world applications. To tackle the LLM hallucination, three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them (mitigation). To address these challenges, this work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation. Specially, we construct a new hallucination benchmark HaluEval 2.0, and designs a simple yet effective detection method for LLM hallucination. Furthermore, we zoom into the different training or utilization stages of LLMs and extensively analyze the potential factors that lead to the LLM hallucination. Finally, we implement and examine a series of widely used techniques to mitigate the hallucinations in LLMs. Our work has led to several important findings to understand the hallucination origin and mitigate the hallucinations in LLMs. Our code and data can be accessed at https://github.com/RUCAIBox/HaluEval-2.0.

References (75)

Authors (7)

Junyi Li (92 papers)
Jie Chen (602 papers)
Ruiyang Ren (18 papers)
Xiaoxue Cheng (12 papers)
Wayne Xin Zhao (196 papers)
Jian-Yun Nie (70 papers)
Ji-Rong Wen (299 papers)

Citations (21)

View on Semantic Scholar

Summary

The paper presents a novel two-step detection method for hallucinations in LLMs, achieving over 90% reliability on human-annotated benchmarks.
The paper finds that domain-specific pre-training and refined prompt design significantly reduce hallucination rates across various professional and open domains.
The paper demonstrates that mitigation strategies like RLHF, retrieval augmentation, and advanced decoding effectively lower factual hallucinations in LLM outputs.

An Empirical Study on Factuality Hallucination in LLMs

This paper, "The Dawn After the Dark: An Empirical Study on Factuality Hallucination in LLMs," explores the pervasive issue of hallucination in LLMs. Hallucinations refer to the generation of content that is factually incorrect. These models, while capable of producing remarkably coherent text, often generate information that isn't grounded in reality, posing significant challenges for their application in critical areas such as clinical diagnoses.

The paper targets three pivotal questions concerning hallucinations in LLMs: detection, source, and mitigation. The authors introduce a benchmark, HaluEval 2.0, designed specifically to evaluate hallucination in these models. Comprising 8,770 questions across diverse domains like biomedicine, finance, science, education, and open domains, this benchmark allows for a comprehensive assessment of LLMs' propensity to hallucinate.

A novel detection method is proposed, utilizing a two-step approach: extracting factual statements from LLM outputs and evaluating these against known world knowledge using a LLM. This method achieved high reliability, with a matching rate exceeding 90% across human-annotated benchmarks, demonstrating effectiveness in identifying hallucinations.

Hallucination Sources

The paper explores multiple sources of hallucinations:

Pre-training: The amount and type of data used in pre-training significantly influence hallucination rates. Models pre-trained with specialized datasets exhibit reduced hallucination in corresponding domains, confirming that domain-specific pre-training can mitigate these errors.
Supervised Fine-Tuning: Fine-tuning with task-specific instructions increases the likelihood of hallucinations, whereas daily-chat instructions show reduced hallucination rates. A balanced complexity in instructions aids in minimizing hallucination.
Inference Methods: Different decoding strategies impact hallucination rates. Diversity-oriented decoding methods increase hallucinations in professional domains, while greedy search exacerbates hallucinations in open-ended domains.
Prompt Design: Rich, detailed prompts reduce hallucination, especially in professional domains. Incorporating in-context examples and well-crafted task descriptions leads to lower hallucination rates.

Mitigation Strategies

Several strategies were evaluated for their efficacy in mitigating hallucinations:

RLHF (Reinforcement Learning from Human Feedback) aligns model outputs with human values, significantly lowering hallucination rates, especially in open domains.
Retrieval Augmentation dramatically reduces hallucinations by providing models with access to accurate knowledge during generation, particularly effective for smaller models.
Self-Reflexion helps models rectify their mistakes in subsequent iterations, although its effectiveness hinges on model scale, showing significant impact only in larger models.
Advanced Decoding techniques that balance diversity and accuracy can effectively diminish hallucination rates.
Prompt Improvement via detailed task information and role definition, combined with Chain-of-Thought (CoT) prompting, can aid models with robust reasoning abilities in reducing hallucination.

Implications and Future Prospects

This empirical paper provides crucial insights into the nature of hallucinations in LLMs and potential avenues for ameliorating this issue. The findings have significant implications for the deployment of LLMs in settings requiring factual correctness and reliability. As LLMs continue to evolve, understanding and controlling their tendency to hallucinate will be essential. The strategies explored in this paper may serve as groundwork for future development. The necessity for domain-specific pre-training, sophisticated decoding strategies, and retrieval augmentation are critical considerations moving forward, especially as these models are integrated into more sensitive and high-stakes applications.

PDF Markdown

Related Papers

GitHub

GitHub - RUCAIBox/HaluEval-2.0 (38 stars)

Tweets

https://twitter.com/a2d2/status/1839352352207450251

https://twitter.com/spirodonfl/status/1922428167492665728

YouTube

Show All Videos