Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words? (2405.16908v2)

Published 27 May 2024 in cs.CL

Abstract: We posit that LLMs should be capable of expressing their intrinsic uncertainty in natural language. For example, if the LLM is equally likely to output two contradicting answers to the same question, then its generated response should reflect this uncertainty by hedging its answer (e.g., "I'm not sure, but I think..."). We formalize faithful response uncertainty based on the gap between the model's intrinsic confidence in the assertions it makes and the decisiveness by which they are conveyed. This example-level metric reliably indicates whether the model reflects its uncertainty, as it penalizes both excessive and insufficient hedging. We evaluate a variety of aligned LLMs at faithfully communicating uncertainty on several knowledge-intensive question answering tasks. Our results provide strong evidence that modern LLMs are poor at faithfully conveying their uncertainty, and that better alignment is necessary to improve their trustworthiness.

References (49)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces the novel ‘Faithful Response Uncertainty’ metric to assess the gap between a model's expressed decisiveness and its internal confidence.
It employs Gemini Ultra and multiple re-sampling techniques to quantify decisiveness and confidence across datasets like Natural Questions and PopQA.
Empirical results reveal that current LLMs misalign hedging language with intrinsic uncertainty, risking both overconfident and underconfident responses.

Faithful Expression of Uncertainty in LLMs

The paper investigates the capacity of LLMs to express their intrinsic uncertainty in natural language responses. The authors posit that such expressions—often through hedging language—can improve the trustworthiness of LLMs by better aligning the decisiveness of their answers with their underlying confidence. This paper provides a detailed examination of LLM performances on conveying uncertainty and proposes a new metric to quantify this ability.

Problem Statement and Contributions

LLMs are known for their high fluency and persuasiveness, which can sometimes result in confidently delivered but incorrect answers. This issue is particularly problematic in knowledge-intensive question answering (QA) tasks, where users might overly rely on the model’s outputs. The authors argue that one way to mitigate this problem is for LLMs to verbalize their uncertainty directly within their generated responses.

The paper makes the following key contributions:

Formalization of Faithful Response Uncertainty: The authors introduce an example-level metric, Faithful Response Uncertainty, to measure the gap between a model’s intrinsic confidence and the decisiveness of its assertions.
Implementation of Decisiveness and Confidence Scoring: The paper employs Gemini Ultra for assessing decisiveness and confidence, ensuring that these measures align with human judgment.
Empirical Evaluation: The research evaluates several leading LLMs (including variants from the Gemini family, GPT-3.5, and GPT-4) on datasets like Natural Questions and PopQA, assessing their ability to express uncertainty faithfully.

Methodology

Formalization

The metric Faithful Response Uncertainty is defined to measure the gap between the model’s confidence in a generated assertion and the decisiveness with which it is expressed. Decisiveness is derived from potential hedging expressions, and confidence is assessed through consistency across multiple re-sampled responses. The formal definition ensures that models are penalized for both excessive and insufficient hedging.

Implementation

The research uses:

Decisiveness Measurement: Quantified by the probability that an agent will deem an assertion true, based on the generated response.
Confidence Measurement: Derived from the consistency of a given assertion with re-sampled answers.

Gemini Ultra serves as the judge model for these assessments, using specific prompts crafted to capture the nuanced nature of decisiveness and confidence.

Results & Findings

The evaluation reveals that state-of-the-art LLMs perform poorly in faithfully expressing their uncertainty. Key findings include:

Decisive Responses: Most LLMs, when using standard decoding techniques, produced highly decisive answers despite significant intrinsic uncertainty.
Inconsistent Hedging: When prompted to express uncertainty, the hedges used by LLMs did not consistently align with their intrinsic uncertainty levels. This misalignment often resulted in both under-hedging (decisive answers despite low confidence) and over-hedging (hedged answers despite high confidence).

Implications and Future Directions

The findings underscore the necessity for better alignment techniques in LLMs to ensure that the decisiveness of their outputs accurately reflects their internal confidence. This alignment is crucial for enhancing the reliability and trustworthiness of these models, particularly in applications where incorrect or over-confident answers could have significant consequences.

In terms of theoretical implications, this paper contributes a framework for understanding and evaluating the expression of uncertainty in natural language by LLMs. Practically, the research suggests directions for improving model design and training protocols to incorporate mechanisms for uncertainty expression faithfully.

Conclusion

The paper makes a significant contribution by highlighting a critical shortcoming of current LLMs: their inability to faithfully express uncertainty in their responses. The proposed metric and methodology for evaluating this capability are robust and align well with human judgment. However, the empirical evaluation demonstrates that existing models fall short of this standard, indicating a pressing need for advancement in this area.

Future research could explore new training techniques, model architectures, or alignment algorithms that prioritize and enhance the faithful expression of intrinsic uncertainty in LLMs. This would not only improve the trustworthiness of these models but also expand their applicability in critical and sensitive information domains.

PDF Markdown

Related Papers

Tweets

https://twitter.com/fly51fly/status/1795415492779204831

https://twitter.com/polisoc2/status/1936622190793027921

https://twitter.com/_galyo/status/1795690896148881434

https://twitter.com/yshemesh/status/1838996353025794493

https://twitter.com/VictorSwift/status/1796280839300690226

https://twitter.com/knishimae0531/status/1795658403106488710