Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words? (2405.16908v2)
Abstract: We posit that LLMs should be capable of expressing their intrinsic uncertainty in natural language. For example, if the LLM is equally likely to output two contradicting answers to the same question, then its generated response should reflect this uncertainty by hedging its answer (e.g., "I'm not sure, but I think..."). We formalize faithful response uncertainty based on the gap between the model's intrinsic confidence in the assertions it makes and the decisiveness by which they are conveyed. This example-level metric reliably indicates whether the model reflects its uncertainty, as it penalizes both excessive and insufficient hedging. We evaluate a variety of aligned LLMs at faithfully communicating uncertainty on several knowledge-intensive question answering tasks. Our results provide strong evidence that modern LLMs are poor at faithfully conveying their uncertainty, and that better alignment is necessary to improve their trustworthiness.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Amos Azaria and Tom Mitchell. 2023. The internal state of an llm knows when its lying. arXiv preprint arXiv:2304.13734.
- Uncertainty in natural language generation: From theory to applications. arXiv preprint arXiv:2307.15703.
- To trust or to think: cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):1–21.
- Discovering latent knowledge in language models without supervision. arXiv preprint arXiv:2212.03827.
- A Philip Dawid. 1982. The well-calibrated bayesian. Journal of the American Statistical Association, 77(379):605–610.
- Did it happen? the pragmatic complexity of veridicality assessment. Computational linguistics, 38(2):301–333.
- Ran El-Yaniv et al. 2010. On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11(5).
- Wade Fagen-Ulmschneider. 2023. Perception of probability words. Ms., UIUC, 05-24-2023.
- Bruce Fraser. 2010. Pragmatic competence: The case of hedging. In New approaches to hedging, pages 15–34. Brill.
- Yonatan Geifman and Ran El-Yaniv. 2017. Selective classification for deep neural networks. Advances in neural information processing systems, 30.
- Does fine-tuning llms on new knowledge encourage hallucinations? arXiv preprint arXiv:2405.05904.
- Gemini-Team. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
- Anastasia Giannakidou. 1999. Affective dependencies. Linguistics and Philosophy, 22:367–421.
- On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551.
- “i am uncertain” vs “it is uncertain”. how linguistic markers of the uncertainty source affect uncertainty communication. Judgment and Decision Making, 12(5):445–465.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
- Selective question answering under domain shift. arXiv preprint arXiv:2006.09462.
- Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696–15707. PMLR.
- Unfamiliar finetuning examples control how language models hallucinate. arXiv preprint arXiv:2403.05612.
- " i’m not sure, but…": Examining the impact of large language models’ uncertainty expression on user reliance and trust. arXiv preprint arXiv:2405.00623.
- Svenja Kranich. 2011. To hedge or not to hedge: the use of epistemic modal expressions in popular science in english texts, english–german translations, and german original texts.
- Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. arXiv preprint arXiv:2302.09664.
- Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
- George Lakoff. 1973. Hedges: A study in meaning criteria and the logic of fuzzy concepts. Journal of philosophical logic, 2(4):458–508.
- Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958.
- Teaching models to express their uncertainty in words. arXiv preprint arXiv:2205.14334.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. arXiv preprint arXiv:2212.10511.
- Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
- Reducing conversational agents’ overconfidence through linguistic calibration. Transactions of the Association for Computational Linguistics, 10:857–872.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
- Samir Passi and Mihaela Vorvoreanu. 2022. Overreliance on ai literature review. Microsoft Research.
- True few-shot learning with language models. Advances in neural information processing systems, 34:11054–11070.
- Llms can learn self-restraint through iterative self-reflection. arXiv preprint arXiv:2405.13022.
- Fine-tuning language models for factuality. arXiv preprint arXiv:2311.08401.
- Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. arXiv preprint arXiv:2305.14975.
- Generation probabilities are not enough: Exploring the effectiveness of uncertainty highlighting in ai-powered code completions. arXiv preprint arXiv:2302.07248.
- Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10):78–85.
- Preferences and reasons for communicating probabilistic information in verbal or numerical terms. Bulletin of the Psychonomic Society, 31(2):135–138.
- Paul D Windschitl and Gary L Wells. 1996. Measuring psychological uncertainty: Verbal versus numeric methods. Journal of Experimental Psychology: Applied, 2(4):343.
- Rejection improves reliability: Training llms to refuse unknown questions using rl from knowledge feedback. arXiv preprint arXiv:2403.18349.
- Alignment for honesty. arXiv preprint arXiv:2312.07000.
- Narrowing the knowledge evaluation gap: Open-domain question answering with multi-granularity answers. arXiv preprint arXiv:2401.04695.
- Hiyori Yoshikawa and Naoaki Okazaki. 2023. Selective-LAMA: Selective prediction for confidence-aware evaluation of language models. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2017–2028, Dubrovnik, Croatia. Association for Computational Linguistics.
- Self-alignment for factuality: Mitigating hallucinations in llms via self-evaluation. arXiv preprint arXiv:2402.09267.
- Alf C Zimmer. 1983. Verbal vs. numerical processing of subjective probabilities. In Advances in psychology, volume 16, pages 159–182. Elsevier.