Detecting Hallucinated Content in Conditional Neural Sequence Generation (2011.02593v3)

Published 5 Nov 2020 in cs.CL and cs.AI

Abstract: Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input. These variety of fluent but wrong outputs are particularly problematic, as it will not be possible for users to tell they are being presented incorrect content. To detect these errors, we propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input) and collect new manually annotated evaluation sets for this task. We also introduce a method for learning to detect hallucinations using pretrained LLMs fine tuned on synthetic data that includes automatically inserted hallucinations Experiments on machine translation (MT) and abstractive summarization demonstrate that our proposed approach consistently outperforms strong baselines on all benchmark datasets. We further demonstrate how to use the token-level hallucination labels to define a fine-grained loss over the target sequence in low-resource MT and achieve significant improvements over strong baseline methods. We also apply our method to word-level quality estimation for MT and show its effectiveness in both supervised and unsupervised settings. Codes and data available at https://github.com/violet-zct/fairseq-detect-hallucination.

Citations (177)

View on Semantic Scholar

Summary

The paper introduces a token-level framework that detects unfaithful content in generated sequences.
It leverages manually annotated and synthetic datasets, achieving an average F1 score of around 0.6 in machine translation benchmarks.
By integrating hallucination labels into training, the approach enhances model reliability and translation quality, particularly in low-resource settings.

Detecting Hallucinated Content in Conditional Neural Sequence Generation

The paper "Detecting Hallucinated Content in Conditional Neural Sequence Generation" addresses the persistent issue of hallucinations in neural sequence models. Hallucinations, defined as fluent but incorrect content that isn't supported by the input data, pose significant challenges in applications like machine translation (MT) and abstractive summarization. These challenges are critical since users may be unaware of the inaccuracy in the information presented, leading to misinformation.

Methodology and Results

The authors propose a novel framework to detect token-level hallucinations. The core idea is to predict whether each token in an output sequence is hallucinated. To facilitate this, the authors assembled manually annotated datasets specifically for this task, focusing on MT and abstractive summarization.

The methodology hinges on training models to identify hallucinations using synthetic datasets enriched with artificial hallucinations. They utilize pretrained LLMs, fine-tuned with data embedded with automatically generated hallucinations, to detect unfaithful content.

Experiments across various benchmark datasets reveal that the proposed approach surpasses existing strong baselines. For instance, in MT, evaluations showed significant improvements in detecting and labeling hallucinated tokens, with an average F1 score of around 0.6, thereby setting a foundation for further research in this area.

Practical and Theoretical Implications

The research has profound implications for enhancing neural sequence models' reliability. By improving hallucination detection, NLG systems can produce more accurate and trustworthy outputs. The proposed token-level detection enables more granular insights into generation errors, proving beneficial for quality estimation in translation tasks without relying on reference text.

Furthermore, the paper explores the application of hallucination labels in MT training, particularly under low-resource conditions. By integrating fine-grained loss metrics based on hallucination detection, they observed notable boosts in translation quality, reducing hallucinations. This approach underscores the potential of refined data handling techniques to maximize training efficacy from noisy datasets.

Future Directions

The advancements in hallucination detection open numerous paths for future research. Exploration into scaling hallucination detection models across different domains and languages remains crucial. Additionally, integrating hallucination detection mechanisms directly within generation models could help mitigate hallucination occurrence at the source, thereby enhancing intrinsic model consistency and robustness. Further development could also aim at refining synthetic data generation processes to improve training model fidelity to nuanced human evaluations.

This paper makes significant strides in addressing and mitigating hallucination in neural sequence generation, laying out a fundamental framework that balances theoretical insights with practical implications in AI model deployment.