Large Language Models Must Be Taught to Know What They Don't Know (2406.08391v2)

Published 12 Jun 2024 in cs.LG, cs.AI, cs.CL, and stat.ML

Abstract: When using LLMs in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.

Citations (7)

View on Semantic Scholar

Summary

The paper’s main contribution is showing that simple prompting fails, while fine-tuning with graded examples produces robust uncertainty estimates.
It introduces effective techniques like LoRA and binary classification prompting that significantly improve calibration metrics such as ECE and AUROC.
A user study confirms that these calibrated uncertainty estimates enhance human decision-making in high-stakes, collaborative AI scenarios.

LLMs Must Be Taught to Know What They Don't Know

Introduction

In the field of AI, the accurate representation of uncertainty in the predictions made by LLMs is crucial, especially in high-stakes applications. This paper addresses a prominent challenge in this domain: when LLMs produce predictions, it is imperative to discern the trustworthiness of these predictions. The current methods, which involve either prompted LLMs or expensive sampling techniques, are debated in terms of their efficacy and practicality.

Core Contributions

The primary contributions of this paper are:

Insufficiency of Prompting: The authors argue that simply prompting LLMs is not adequate for generating calibrated uncertainty estimates reliably.
Effective Fine-Tuning: They showcase how fine-tuning models on a small dataset of graded correct and incorrect answers can lead to generalizable and computationally efficient uncertainty estimates. This method outperforms baseline approaches when trained on a thousand graded examples.
Mechanisms of Reliable Estimation: The paper investigates how LLMs can serve as general-purpose uncertainty estimators, not only for their own outputs but also for other models.
User Study on Collaborative Decision Making: The paper presents findings from a user paper showing that uncertainty estimates can effectively inform human decision-making in collaborative human-AI settings.

Experimental Findings

Zero-Shot and Black-Box Methods

Initial experiments reveal that standard zero-shot prompting and black-box techniques, such as perplexity in open-ended generation, fail to provide reliable uncertainty estimates. Their AUROC scores show limited predictive power and slow improvement with increased model size. This indicates that model calibration does not improve substantially with out-of-the-box methods, necessitating the need for a structured fine-tuning approach.

Fine-Tuning Approach

Three parameterization techniques were examined for producing fine-tuned uncertainty estimates:

Probe: A feed-forward neural network is trained on the last layer features of a pre-trained LLM, which remains frozen.
LoRA (Low-Rank Adapters): Similar to Probe, but with trainable low-rank adapters added to the base model.
LoRA + Prompt: Combines fine-tuning with a language prompt structured as a binary classification problem using pre-defined tokens.

The fine-tuning methods demonstrated substantial improvements in calibration and selective prediction metrics. Specifically, fine-tuning with LoRA and LoRA + Prompt methods showed significant gains in Expected Calibration Error (ECE) and AUROC scores over zero-shot methods.

Generalization and Robustness

Fine-tuning exhibited strong generalization capabilities across different distribution shifts:

Subject Matter: Despite varying proportions of different subjects in the fine-tuning datasets, the fine-tuned models performed consistently across diverse subjects.
Format Shifts: Models fine-tuned on open-ended questions could generalize well to multiple-choice settings and vice versa.
Problem Solvability: On datasets comprising both answerable and unanswerable questions, fine-tuned models assigned lower confidence to unanswerable questions, indicating robustness in distinguishing between solvable and unsolvable queries.

Furthermore, cross-model evaluations indicated that models trained for uncertainty estimation on different base models can competently predict the correctness of generations from other models. This highlights that the understanding of uncertainty can generalize beyond the specifics of a given model's internal structure.

User Study Insights

A user paper conducted with $N=181$ participants demonstrated that users are sensitive to well-calibrated confidence estimates when collaborating with LLMs. Participants adjusted their reliance on the LLM's predictions based on the reported confidence, particularly benefiting lower-performing users by aiding their decision-making processes. This underscores the practical impact of well-calibrated uncertainties in enhancing human-AI collaborative efforts.

Conclusion

The paper concludes that robust uncertainty estimates are achievable with relatively modest amounts of fine-tuning data and that these estimates hold practical benefits both theoretically and in real-world applications. Future work should focus on integrating uncertainty estimates directly into the LLMs' generation processes and exploring more sophisticated methods for handling unsolvable tasks and contextual nuances in predictions.

Implications and Future Directions

The findings imply several avenues for further research and development:

Unified Models: Developing models that integrate both answer generation and uncertainty estimation without switching weights.
Active Learning: Utilizing high-quality uncertainties in active learning frameworks to optimize fine-tuning and model performance efficiently.
Human-Centered Design: Enhancing user interfaces to better communicate confidence and leveraging interdisciplinary studies to refine how confidence scores are interpreted and used by human collaborators.

The paper stands as a significant step towards making LLMs more reliable and interpretable, thereby facilitating safer and more effective deployment of AI systems in complex decision-making scenarios.

PDF Markdown

Related Papers

Tweets

https://twitter.com/iScienceLuvr/status/1801081556225917206

https://twitter.com/micahgoldblum/status/1801259291883475273

https://twitter.com/4evaBehindSOTA/status/1816194131674423635

https://twitter.com/andrewgwils/status/1801259781950230967

https://twitter.com/gruver_nate/status/1801256976292229215

https://twitter.com/micahgoldblum/status/1801259307385729282

YouTube

Show All Videos