Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs (2402.08733v2)

Published 13 Feb 2024 in cs.LG

Abstract: Identifying how much a model ${\widehat{p}}{\theta}(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty), and existing epistemic uncertainty quantification techniques tend to be overconfident when the model underfits. We propose a general strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between ${\widehat{p}}{\theta}(Y|X)$ and $p(Y|X)$: train it to predict pairs of independent responses drawn from the true conditional distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. Remarkably, we prove that being good at cheating (i.e. cheating whenever it improves your prediction) is equivalent to being second-order calibrated, a principled extension of ordinary calibration that allows us to construct provably-correct frequentist confidence intervals for $p(Y|X)$ and detect incorrect responses with high probability. We demonstrate empirically that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) LLMing, and partially-observable navigation tasks, outperforming existing techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
  2. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511, 2021.
  3. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  4. Barber, R. F. Is distribution-free inference possible for binary regression? arXiv: Statistics Theory, 2020.
  5. Pitfalls of epistemic uncertainty quantification through loss minimisation. Advances in Neural Information Processing Systems, 35:29205–29216, 2022.
  6. On second-order scoring rules for epistemic uncertainty quantification. arXiv preprint arXiv:2301.12736, 2023.
  7. Completely positive matrices. World Scientific, 2003.
  8. Regression and classification using Gaussian process priors. Bayesian statistics, 6:475, 1998.
  9. It’s mbr all the way down: Modern generation techniques through the lens of minimum bayes risk. arXiv preprint arXiv:2310.01387, 2023.
  10. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  11. When does optimizing a proper loss yield calibration? ArXiv, abs/2305.18764, 2023.
  12. Cantelli, F. P. Sui confini della probabilita. In Atti del Congresso Internazionale dei Matematici: Bologna del 3 al 10 de settembre di 1928, pp.  47–60, 1929.
  13. The calibration generalization gap. ArXiv, abs/2210.01964, 2022.
  14. Chedzoy, O. B. Phi‐Coefficient. In Kotz, S., Read, C. B., Balakrishnan, N., and Vidakovic, B. (eds.), Encyclopedia of Statistical Sciences. Wiley, 2 edition, December 2005. ISBN 9780471150442 9780471667193. doi: 10.1002/0471667196.ess1960.pub2. URL https://onlinelibrary.wiley.com/doi/10.1002/0471667196.ess1960.pub2.
  15. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  16. Universal self-consistency for large language model generation. ArXiv, abs/2311.17311, 2023.
  17. Chollet, F. et al. Keras. https://keras.io, 2015.
  18. Training verifiers to solve math word problems. ArXiv, abs/2110.14168, 2021.
  19. Transfer and marginalize: Explaining away label noise with privileged information. In International Conference on Machine Learning, pp. 4219–4237. PMLR, 2022.
  20. Dawid, A. P. The well-calibrated Bayesian. Journal of the American Statistical Association, 77(379):605–610, 1982.
  21. Dawid, A. P. Present position and potential developments: Some personal views statistical theory the prequential approach. Journal of the Royal Statistical Society: Series A (General), 147(2):278–290, 1984.
  22. Dawid, A. P. Calibration-based empirical probability. The Annals of Statistics, 13(4):1251–1274, 1985.
  23. The DeepMind JAX Ecosystem, 2020. URL http://github.com/google-deepmind.
  24. Ding, P. A first course in causal inference. arXiv preprint arXiv:2305.18793, 2023.
  25. Asymptotic calibration. Biometrika, 85(2):379–390, 1998.
  26. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, 2015.
  27. Logical induction. arXiv preprint arXiv:1609.03543, 2016.
  28. Bayesian neural networks: An introduction and survey. ArXiv, abs/2006.12024, 2020.
  29. On calibration of modern neural networks. In International conference on machine learning, pp. 1321–1330. PMLR, 2017.
  30. Distribution-free calibration guarantees for histogram binning without sample splitting. In International Conference on Machine Learning, pp. 3942–3952. PMLR, 2021.
  31. Distribution-free binary classification: prediction sets, confidence intervals and calibration. Advances in Neural Information Processing Systems, 33:3711–3723, 2020.
  32. Multicalibration: Calibration for the (computationally-identifiable) masses. In International Conference on Machine Learning, 2018.
  33. Augmix: A simple data processing method to improve robustness and uncertainty. arXiv preprint arXiv:1912.02781, 2019.
  34. Scalable variational Gaussian process classification. In Artificial Intelligence and Statistics, pp.  351–360. PMLR, 2015.
  35. Hoeffding, W. Probability inequalities for sums of bounded random variables. The collected works of Wassily Hoeffding, pp.  409–426, 1994.
  36. Survey of hallucination in natural language generation. ACM Computing Surveys, 55:1 – 38, 2022.
  37. Language models (mostly) know what they know. ArXiv, abs/2207.05221, 2022.
  38. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
  39. Calibrated language models must hallucinate. ArXiv, abs/2311.14648, 2023.
  40. What uncertainties do we need in Bayesian deep learning for computer vision? ArXiv, abs/1703.04977, 2017.
  41. Krizhevsky, A. Learning multiple layers of features from tiny images. In Tech report, 2009.
  42. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In The Eleventh International Conference on Learning Representations, 2022.
  43. Novel decompositions of proper scoring rules for classification: Score adjustment as precursor to calibration. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I 15, pp.  68–85. Springer, 2015.
  44. Verified uncertainty calibration. Advances in Neural Information Processing Systems, 32, 2019.
  45. DEUP: Direct epistemic uncertainty prediction. arXiv preprint arXiv:2102.08501, 2021.
  46. Simple and scalable predictive uncertainty estimation using deep ensembles. In Neural Information Processing Systems, 2016.
  47. Competition-level code generation with AlphaCode. Science, 378:1092 – 1097, 2022a.
  48. Making language models better reasoners with step-aware verifier. In Annual Meeting of the Association for Computational Linguistics, 2022b.
  49. Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. ArXiv, abs/2006.10108, 2020.
  50. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  51. A simple baseline for Bayesian uncertainty in deep learning. In Neural Information Processing Systems, 2019.
  52. Predictive uncertainty estimation via prior networks. Advances in neural information processing systems, 31, 2018.
  53. Uncertainty estimation in autoregressive structured prediction. In International Conference on Learning Representations, 2020.
  54. Second order calibration: A simple way to get approximate posteriors. arXiv preprint arXiv:1510.08437, 2015.
  55. Uncertainty baselines: Benchmarks for uncertainty & robustness in deep learning. arXiv preprint arXiv:2106.04015, 2021.
  56. Collision probability matching loss for disentangling epistemic uncertainty from aleatoric uncertainty. In International Conference on Artificial Intelligence and Statistics, pp.  11355–11370. PMLR, 2023.
  57. LEVER: learning to verify language-to-code generation with execution. ArXiv, abs/2302.08468, 2023.
  58. Oakes, D. Self-calibrating priors do not exist. Journal of the American Statistical Association, 80:339–339, 1985.
  59. OpenAI. GPT-4 technical report, 2023.
  60. Shaking the foundations: delusions in sequence models for interaction and control. arXiv preprint arXiv:2110.10819, 2021.
  61. Epistemic neural networks. ArXiv, abs/2107.08924, 2021.
  62. PAC confidence predictions for deep neural network classifiers. ArXiv, abs/2011.00716, 2020.
  63. Beyond calibration: estimating the grouping loss of modern neural networks. arXiv preprint arXiv:2210.16315, 2022.
  64. Human uncertainty makes classification more robust. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  9617–9626, 2019.
  65. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  66. Classification. In Gaussian Processes for Machine Learning. The MIT Press, 11 2005. ISBN 9780262256834. doi: 10.7551/mitpress/3206.003.0006. URL https://doi.org/10.7551/mitpress/3206.003.0006.
  67. Calibration with many checking rules. Math. Oper. Res., 28:141–153, 2003.
  68. Is one annotation enough?-a data-centric image classification benchmark for noisy and ambiguous label estimation. Advances in Neural Information Processing Systems, 35:33215–33232, 2022.
  69. Equivalence between policy gradients and soft Q-learning. arXiv preprint arXiv:1704.06440, 2017.
  70. Evidential deep learning to quantify classification uncertainty. ArXiv, abs/1806.01768, 2018.
  71. Game-theoretic foundations for probability and finance, volume 455. John Wiley & Sons, 2019.
  72. Evaluating model calibration in classification. In The 22nd International Conference on Artificial Intelligence and Statistics, pp.  3459–3467. PMLR, 2019.
  73. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  74. Nonparametric predictive distributions based on conformal prediction. In Conformal and probabilistic prediction and applications, pp.  82–102. PMLR, 2017.
  75. Self-consistency improves chain of thought reasoning in language models. ArXiv, abs/2203.11171, 2022.
  76. Robust asymmetric learning in POMDPs. In International Conference on Machine Learning, 2020.
  77. Estimating means of bounded random variables by betting. arXiv preprint arXiv:2010.09686, 2020.
  78. Learning with noisy labels revisited: A study using real-world human annotations. ArXiv, abs/2110.12088, 2021.
  79. From predictions to decisions: The importance of joint predictive distributions. arXiv preprint arXiv:2107.09224, 2021.
  80. On layer normalization in the transformer architecture. In International Conference on Machine Learning, pp. 10524–10533. PMLR, 2020.
  81. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  694–699, 2002.
  82. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  83. Decoding methods in neural language generation: a survey. Information, 12(9):355, 2021.
Citations (5)

Summary

  • The paper introduces a novel paired-response strategy that quantifies epistemic uncertainty through calibrated cheating behavior.
  • It extends traditional calibration to a second-order framework, enabling accurate confidence intervals and detection of statistical hallucinations.
  • Empirical validations across image classification and reinforcement learning tasks demonstrate the method's superiority over standard uncertainty quantification techniques.

A Principled Approach for Quantifying Model Uncertainty through Paired Predictions and its Applications

Introduction

In the field of generative models, such as LLMs, accurately quantifying what the model does not know is crucial to prevent incorrect outputs or actions, especially in scenarios where model decisions can have significant consequences. Traditional probabilistic predictions struggle to differentiate between variability inherent to the data (aleatoric uncertainty) and the model’s own uncertainty due to lack of knowledge or data (epistemic uncertainty). Existing techniques for quantifying epistemic uncertainty often fall short, particularly when the model underfits the data. This research addresses these challenges by introducing a novel strategy for simultaneously approximating a true stochastic process and estimating the uncertainty in that approximation. The strategy is based on training models to predict paired responses and allowing them to "cheat" under controlled conditions, leading to a method that correlates model "cheating" behavior with its uncertainty about the process being modeled. The approach is shown to accurately estimate model knowledge across various tasks, outperforming existing uncertainty quantification baselines.

Methodology

Key to this paper is the introduction of second-order calibration, a concept that extends traditional (first-order) calibration to require models not only to predict an event's probability accurately but also to estimate the variance around that prediction correctly. The proposed method involves training models to predict independent pairs of responses from the true distribution, allowing the model to observe one response while predicting the other, and then measuring how much observation improves the prediction. The novelty lies in demonstrating that this cheating behavior, when properly calibrated, can serve as a robust indicator of the model's uncertainty.

Theoretical Contributions

The paper provides a rigorous theoretical foundation for the proposed approach. It demonstrates that a model's ability to improve its predictions through cheating is equivalent to being second-order calibrated. Furthermore, the paper proves that, given a second-order calibrated model, it is possible to construct frequentist confidence intervals for the true probabilities of outcomes and effectively detect incorrect model responses (statistical hallucinations) with high probability. This equivalence between paired-response prediction and second-order calibration underpins the development of new tools for uncertainty quantification without making restrictive assumptions about the data distribution.

Empirical Demonstrations

The effectiveness of the proposed method is empirically validated through applications to image classification, synthetic LLMing, and reinforcement learning tasks. These tasks are chosen to represent both discrete and sequential output spaces, as well as partially observable decision-making problems. Across these diverse settings, the method accurately quantifies the model's epistemic uncertainty and demonstrates its practical utility for improving model reliability. Notable is its application to offline reinforcement learning under partial observation, where it successfully avoids unsafe actions by accounting for unobserved confounders in decision-making processes.

Practical Implications and Future Directions

This research provides a significant step forward in understanding and quantifying model uncertainty, with implications for a broad range of AI applications. By improving how models estimate their own knowledge and uncertainty, the approach can contribute to safer AI systems that recognize and communicate their limitations, reducing the risk of unwarranted reliance on their outputs. Looking ahead, extending this approach to more complex and larger-scale models presents an exciting avenue for research. Additionally, exploring ways to operationalize paired predictions in settings where collecting paired responses may be challenging could further broaden the applicability of this strategy.

Concluding Remarks

In conclusion, this paper introduces a robust and theoretically grounded approach to uncertainty quantification for generative models, addressing a critical need in the development of reliable AI systems. Through a combination of theoretical insights and empirical validation, it establishes a framework for more accurately capturing and communicating what models do not know, paving the way for the creation of AI systems that can more safely interact with the real world.