- The paper proposes a mutual information-based iterative prompting method to quantify epistemic uncertainty and flag potential hallucinations.
- It employs a pseudo joint distribution and calibration algorithm to distinguish between epistemic and aleatoric uncertainty in LLM outputs.
- Experimental results on TriviaQA, AmbigQA, and WordNet demonstrate superior performance in detecting hallucinations compared to standard methods.
To Believe or Not to Believe Your LLM
Introduction
This paper addresses a critical problem in LLMs: understanding and quantifying uncertainty in their outputs to detect when a response may involve hallucination. Hallucination in LLMs refers to generating text that is syntactically and semantically correct but factually incorrect or nonsensical within the given context. The paper introduces novel methods to quantify and differentiate between epistemic and aleatoric uncertainties in LLM outputs, with applications in preventing hallucinations.
Uncertainty in LLMs
The paper distinguishes between epistemic uncertainty, which is related to model ignorance (due to limited knowledge or inadequate modeling), and aleatoric uncertainty, which stems from inherent randomness (such as when multiple responses could be equally valid). The authors propose an information-theoretic approach to detect high epistemic uncertainty and, consequently, potential hallucination. The idea is that when epistemic uncertainty is high, the LLM's response is likely unreliable.
Iterative Prompting and Joint Distributions
To estimate epistemic uncertainty, the paper employs a unique iterative prompting strategy where the LLM is queried multiple times, and its responses are used to form a pseudo joint distribution over possible responses. This approach allows the assessment of mutual information as a metric for epistemic uncertainty. The lower this mutual information, the closer the LLM's responses are to the ground truth distribution.
Figure 1
Figure 1: Single-label queries with low epistemic uncertainty: Conditional normalized probability of the correct completion given repetitions of an incorrect response.
Model Implementation and Calibration
The paper outlines an algorithm for estimating mutual information from the pseudo joint distributions created by iterative prompting. To derive practical utility from this, the authors propose a scoring mechanism that allows the LLM to abstain from responding when epistemic uncertainty surpasses a certain threshold, thus avoiding potential hallucination.
Figure 2
Figure 2: Multi-label queries with aleatoric uncertainty: Conditional normalized probability of the first of the two provided responses, both correct, given repetitions of the second response.
Experimental Validation
The authors validate their approach on several datasets, including TriviaQA and AmbigQA, and synthesize cases from WordNet to test multi-label scenarios. The experiments compare their mutual information-based approach against baseline methods, such as direct probability thresholding and self-verification through LLM self-prompting methods. Results indicate superior performance in detecting hallucinations, especially in mixed query datasets where models need to discern between queries likely to induce epistemic versus aleatoric uncertainty.
Figure 3
Figure 3: Distributions of bounds on the missing mass show empirical distributions that contribute to the calibration and understanding of LLM behavior under various uncertainties.
Conclusion
The paper provides a methodological advancement for quantifying epistemic uncertainty in LLMs using iterative prompting and mutual information. This approach allows developers to set thresholds for LLMs to abstain from generating potentially incorrect responses, significantly reducing the likelihood of hallucination. Future work could involve refining these techniques further to explore their applicability across different types and sizes of models.
By advancing our understanding and control over LLMs, this research contributes valuable insights into designing more reliable and robust language-based AI systems.