Emergent Mind

Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

(2401.06730)
Published Jan 12, 2024 in cs.CL , cs.AI , and cs.HC

Abstract

As natural language becomes the default interface for human-AI interaction, there is a critical need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence about their responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are unable to express uncertainties when answering questions even when they produce incorrect responses. LMs can be explicitly prompted to express confidences, but tend to be overconfident, resulting in high error rates (on average 47%) among confident responses. We test the risks of LM overconfidence by running human experiments and show that users rely heavily on LM generations, whether or not they are marked by certainty. Lastly, we investigate the preference-annotated datasets used in RLHF alignment and find that humans have a bias against texts with uncertainty. Our work highlights a new set of safety harms facing human-LM interactions and proposes design recommendations and mitigating strategies moving forward.

Overview

  • Language models often fail to express uncertainty accurately, potentially misleading users.

  • Linguistic tools called epistemic markers show LMs lean towards overconfidence even when incorrect.

  • Users tend to over-rely on AI information presented with high confidence, affecting decision-making.

  • Human annotator bias during reinforcement learning with human feedback contributes to models' overconfidence.

  • Refining LM training processes may lead to better alignment with human-AI communication norms.

Introduction to Language Models and Epistemic Markers

Language models (LMs) like GPT, LLaMA-2, and Claude are at the forefront of human-AI interfaces, facilitating a range of tasks through natural language interaction. A critical aspect of this interface is the models' ability to communicate their confidence—or lack thereof—in their responses. This trustworthiness is particularly consequential in information-seeking scenarios. The use of epistemic markers, linguistic tools that convey the speaker’s certainty, is one way to clearly communicate these uncertainties. However, research shows LMs struggle in expressing uncertainties accurately, which can impair the user's decision-making process when relying on AI-generated information.

Investigating Expression of Uncertainty in LMs

A recent analysis has indicated that LMs, even with explicit prompting, are more likely to express overconfidence. When asked to articulate their confidence level in a response using epistemic markers, LMs use strengtheners (expressions of certainty) more often than weakeners (expressions of uncertainty), despite a significant portion of those confident responses being incorrect.

User Response to LMs' Confidence Expressions

Understanding how users interpret and rely on epistemic markers from LMs is crucial. Studies have found that when LMs provide responses with expressions of high confidence, users tend to heavily rely on them, even when the LMs do not integrate any epistemic markers, implicitly suggesting certainty. Interestingly, even slight inaccuracies in how LMs use these markers can lead to substantial negative effects on user performance over time. The tendency for LMs to convey overconfidence could lead to an over-reliance on AI, highlighting the need for better linguistic calibration between model-generated confidence and actual model accuracy.

Origins of Overconfidence and Potential Mitigations

Investigating the origins of this overconfidence, it appears that the process of reinforcement learning with human feedback (RLHF) plays a pivotal role. Human annotators show a bias against expressions of uncertainty within the texts used in RLHF alignment. These findings suggest a need for corrective action in the design process of LMs to produce more linguistically calibrated responses. Rethinking design strategies could involve generating expressions of uncertainty more naturally and prompting LMs to use plain statements only when the confidence level is authentically high.

Conclusion and Forward Thinking

In conclusion, research shows that current LM practices in expressing uncertainties are not aligned with ideal human-AI communicative standards. LMs' struggle with expressing uncertainties accurately impacts human reliance on AI-generated responses. Identifying the RLHF process as one source of this overconfidence opens the door to reconsider and refine our approach to training LMs, ultimately leading to more reliable and safer human-AI interactions.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.