Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

98 tokens/sec

GPT-4o

8 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty (2401.06730v2)

Published 12 Jan 2024 in cs.CL, cs.AI, and cs.HC

Abstract: As natural language becomes the default interface for human-AI interaction, there is a need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses. LMs can be explicitly prompted to express confidences, but tend to be overconfident, resulting in high error rates (an average of 47%) among confident responses. We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations, whether or not they are marked by certainty. Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty. Our work highlights new safety harms facing human-LM interactions and proposes design recommendations and mitigating strategies moving forward.

References (80)

Citations (36)

View on Semantic Scholar

Summary

The paper shows that language models predominantly use strong epistemic markers even when their responses are uncertain.
It examines how reinforcement learning with human feedback (RLHF) contributes to an overconfident expression in generated content.
The study highlights that overconfident expressions mislead users and may lead to dangerous over-reliance on AI-provided information.

Introduction to LLMs and Epistemic Markers

LLMs (LMs) like GPT, LLaMA-2, and Claude are at the forefront of human-AI interfaces, facilitating a range of tasks through natural language interaction. A critical aspect of this interface is the models' ability to communicate their confidence—or lack thereof—in their responses. This trustworthiness is particularly consequential in information-seeking scenarios. The use of epistemic markers, linguistic tools that convey the speaker’s certainty, is one way to clearly communicate these uncertainties. However, research shows LMs struggle in expressing uncertainties accurately, which can impair the user's decision-making process when relying on AI-generated information.

Investigating Expression of Uncertainty in LMs

A recent analysis has indicated that LMs, even with explicit prompting, are more likely to express overconfidence. When asked to articulate their confidence level in a response using epistemic markers, LMs use strengtheners (expressions of certainty) more often than weakeners (expressions of uncertainty), despite a significant portion of those confident responses being incorrect.

User Response to LMs' Confidence Expressions

Understanding how users interpret and rely on epistemic markers from LMs is crucial. Studies have found that when LMs provide responses with expressions of high confidence, users tend to heavily rely on them, even when the LMs do not integrate any epistemic markers, implicitly suggesting certainty. Interestingly, even slight inaccuracies in how LMs use these markers can lead to substantial negative effects on user performance over time. The tendency for LMs to convey overconfidence could lead to an over-reliance on AI, highlighting the need for better linguistic calibration between model-generated confidence and actual model accuracy.

Origins of Overconfidence and Potential Mitigations

Investigating the origins of this overconfidence, it appears that the process of reinforcement learning with human feedback (RLHF) plays a pivotal role. Human annotators show a bias against expressions of uncertainty within the texts used in RLHF alignment. These findings suggest a need for corrective action in the design process of LMs to produce more linguistically calibrated responses. Rethinking design strategies could involve generating expressions of uncertainty more naturally and prompting LMs to use plain statements only when the confidence level is authentically high.

Conclusion and Forward Thinking

In conclusion, research shows that current LM practices in expressing uncertainties are not aligned with ideal human-AI communicative standards. LMs' struggle with expressing uncertainties accurately impacts human reliance on AI-generated responses. Identifying the RLHF process as one source of this overconfidence opens the door to reconsider and refine our approach to training LMs, ultimately leading to more reliable and safer human-AI interactions.

PDF Markdown

Tweets

https://twitter.com/KaitlynZhou/status/1822152863730090253

https://twitter.com/MaartenSap/status/1795905758682341428

https://twitter.com/fly51fly/status/1747011136195846567

https://twitter.com/osipenks/status/1747194419323802078

https://twitter.com/AnaBildea/status/1747272320086065349

https://twitter.com/gm8xx8/status/1747090065350701111