Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 154 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 25 tok/s Pro
Kimi K2 190 tok/s Pro
GPT OSS 120B 419 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach (2404.15993v4)

Published 24 Apr 2024 in cs.LG and cs.CL

Abstract: In this paper, we study the problem of uncertainty estimation and calibration for LLMs. We begin by formulating the uncertainty estimation problem, a relevant yet underexplored area in existing literature. We then propose a supervised approach that leverages labeled datasets to estimate the uncertainty in LLMs' responses. Based on the formulation, we illustrate the difference between the uncertainty estimation for LLMs and that for standard ML models and explain why the hidden neurons of the LLMs may contain uncertainty information. Our designed approach demonstrates the benefits of utilizing hidden activations to enhance uncertainty estimation across various tasks and shows robust transferability in out-of-distribution settings. We distinguish the uncertainty estimation task from the uncertainty calibration task and show that better uncertainty estimation leads to better calibration performance. Furthermore, our method is easy to implement and adaptable to different levels of model accessibility including black box, grey box, and white box.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information fusion 76 243–297.
  2. Distinguishing the knowable from the unknowable with language models. arXiv preprint arXiv:2402.03563 .
  3. The internal state of an llm knows when its lying. arXiv preprint arXiv:2304.13734 .
  4. Findings of the 2014 workshop on statistical machine translation. Proceedings of the ninth workshop on statistical machine translation. 12–58.
  5. Breiman, Leo. 2001. Random forests. Machine learning 45 5–32.
  6. Language models are few-shot learners. Advances in neural information processing systems 33 1877–1901.
  7. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 .
  8. Discovering latent knowledge in language models without supervision. arXiv preprint arXiv:2212.03827 .
  9. Do androids know they’re only dreaming of electric sheep? arXiv preprint arXiv:2312.17249 .
  10. Inside: Llms’ internal states retain the power of hallucination detection. arXiv preprint arXiv:2402.03744 .
  11. Calibration of pre-trained transformers. arXiv preprint arXiv:2003.07892 .
  12. Do llms know about hallucination? an empirical investigation of llm’s hidden states. arXiv preprint arXiv:2402.09733 .
  13. Shifting attention to relevance: Towards the uncertainty estimation of large language models. arXiv preprint arXiv:2307.01379 .
  14. Benchmarking bayesian deep learning with diabetic retinopathy diagnosis. Preprint at https://arxiv. org/abs/1912.10481 .
  15. Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics 8 539–555.
  16. A survey of uncertainty in deep neural networks. Artificial Intelligence Review 56(Suppl 1) 1513–1589.
  17. Gemma doi:10.34740/KAGGLE/M/3301. URL https://www.kaggle.com/m/3301.
  18. On calibration of modern neural networks. International conference on machine learning. PMLR, 1321–1330.
  19. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 .
  20. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551 .
  21. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221 .
  22. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. arXiv preprint arXiv:2302.09664 .
  23. Conformal prediction with large language models for multi-choice question answering. arXiv preprint arXiv:2305.18404 .
  24. Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems 36.
  25. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04). 605–612.
  26. ORANGE: a method for evaluating automatic evaluation metrics for machine translation. COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics. COLING, Geneva, Switzerland, 501–507. URL https://www.aclweb.org/anthology/C04-1072.
  27. Generating with confidence: Uncertainty quantification for black-box large language models. arXiv preprint arXiv:2305.19187 .
  28. Towards collaborative neural-symbolic graph semantic parsing via uncertainty. Findings of the Association for Computational Linguistics: ACL 2022 .
  29. Cognitive dissonance: Why do language model outputs disagree with internal representations of truthfulness? arXiv preprint arXiv:2312.03729 .
  30. Uncertainty estimation in autoregressive structured prediction. International Conference on Learning Representations. URL https://openreview.net/forum?id=jN5y-zb5Q7m.
  31. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896 .
  32. Reducing conversational agents’ overconfidence through linguistic calibration. Transactions of the Association for Computational Linguistics 10 857–872.
  33. Language models with conformal factuality guarantees. arXiv preprint arXiv:2402.10978 .
  34. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 27730–27744.
  35. Bleu: a method for automatic evaluation of machine translation. 311–318.
  36. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers 10(3) 61–74.
  37. Softmax probabilities (mostly) predict large language model correctness on multiple-choice q&a. arXiv preprint arXiv:2402.13213 .
  38. Conformal language modeling. arXiv preprint arXiv:2306.10193 .
  39. Language models are unsupervised multitask learners. OpenAI blog 1(8) 9.
  40. A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922 .
  41. Re-examining calibration: The case of question answering. arXiv preprint arXiv:2205.12507 .
  42. The curious case of hallucinatory (un) answerability: Finding truths in the hidden states of over-confident large language models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 3607–3625.
  43. Unsupervised real-time hallucination detection based on the internal states of large language models. arXiv preprint arXiv:2403.06448 .
  44. Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. arXiv preprint arXiv:2305.14975 .
  45. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 .
  46. Reducing llm hallucinations using epistemic neural networks. arXiv preprint arXiv:2312.15576 .
  47. Hallucination is inevitable: An innate limitation of large language models. arXiv preprint arXiv:2401.11817 .
  48. Can explanations be useful for calibrating black box models? arXiv preprint arXiv:2110.07586 .
  49. Transforming classifier scores into accurate multiclass probability estimates. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 694–699.
  50. A study on the calibration of in-context learning. arXiv preprint arXiv:2312.04021 .
  51. Knowing more about questions can help: Improving calibration in question answering. arXiv preprint arXiv:2106.01494 .
Citations (15)

Summary

  • The paper introduces a supervised calibration framework leveraging LLM hidden activations to quantify uncertainty in responses.
  • The methodology integrates white-box and grey-box features to yield improved AUROC performance over unsupervised benchmarks.
  • The approach offers practical insights for enhancing LLM trustworthiness across tasks like question answering and machine translation.

Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

This paper, titled "Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach," addresses the critical problem of estimating the uncertainty in responses generated by LLMs. The authors propose a supervised method to quantify and calibrate uncertainty, exploiting hidden activations within LLMs. By systematically evaluating different architectures and methodologies, this research enhances understanding about LLMs' inherent uncertainties, aiming to improve trustworthiness and applicability across various tasks.

Introduction

The rapid advancements in natural language processing powered by LLMs have markedly improved capabilities in understanding and generating human-like text. However, these models often generate unreliable outputs, leading to potential misinformation. Traditional machine learning uncertainty estimation approaches, focusing on fixed-dimensional outputs, face challenges when applied to the variable outputs typical of natural language generation (NLG) tasks. This paper introduces a novel, supervised approach that leverages hidden activations from LLMs, setting it apart from traditional methods which typically don't utilize such deep insights.

Methodology

The proposed methodology involves a supervised framework to estimate uncertainty, distinguishing it fundamentally from existing black-box metrics that rely on outputs like entropy or similarity without access to internal activations. The method takes advantage of white-box features by utilizing internal activations to derive uncertainty metrics, thus offering a more nuanced understanding than entropy-based methods.

Problem Setup

For given input prompts x\bm{x}, LLMs generate responses y\bm{y} through a series of probabilistic token selections, modeled here for tasks such as question answering. The task of uncertainty estimation is then defined as predicting the scoring function s(y,ytrue)s(\bm{y}, \bm{y}_{\text{true}}) using learned functions g(x,y)g(\bm{x}, \bm{y}), thereby estimating the expected correctness of generated responses.

Supervised Calibration

The supervised calibration model draws features from both white-box (hidden activations) and grey-box (e.g., entropy-based features) sources, resulting in a structured dataset that informs the learning model for uncertainty estimation. This approach effectively extracts richer uncertainty insights compared to unsupervised methods. Figure 1

Figure 1

Figure 1: Features from Gemma-7B.

Evaluation and Results

The authors implement their approach across several LLMs, including LLaMA2-7B and Gemma-7B, and evaluate it on tasks such as question answering (TriviaQA, CoQA) and machine translation (WMT 2014). By employing area under the receiver operating characteristic curve (AUROC) as a performance metric, the paper clearly demonstrates the superior performance of their method over existing benchmarks, which often overly rely on single-metric unsupervised features.

Robust Performance

The empirical results consistently show that the supervised approach, leveraging internal activations, outperforms existing unsupervised methods. This includes not only improved AUROC scores but also better calibration of uncertainty estimates across both in-distribution and out-of-distribution (OOD) scenarios. Figure 2

Figure 2: Uncertainty scores of different methods on the MMLU dataset for answers provided by the Gemma-7B model.

Discussion

Layer and Architecture Insights

The research highlights the advantage of extracting information from middle-layer activations, which apparently encapsulate more useful uncertainty information compared to last-layer activations focused on immediate token generation tasks. The scaling effect indicates no significant performance difference between different model sizes, suggesting that larger model sizes don't necessarily correlate with improved uncertainty insights.

Practical Implications and Future Directions

By successfully demonstrating that internal activations can be effectively harnessed for uncertainty estimation, the paper paves the way for applications in improving LLM trustworthiness. Furthermore, the approach opens new avenues for improving closed-source LLMs' uncertainty predictions via publicly available models.

Conclusion

This paper systematically explores uncertainty estimation for LLMs through a novel supervised approach, demonstrating significant enhancements over existing methods. The findings emphasize the value of utilizing hidden activations, thereby improving the reliability and robustness of LLM outputs across diverse NLP tasks. Future work may explore fine-tuning models to tailor uncertainty estimations for more specific applications, extending the approach to open-domain questions beyond structured NLP datasets.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com