Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs (2402.08733v2)
Abstract: Identifying how much a model ${\widehat{p}}{\theta}(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty), and existing epistemic uncertainty quantification techniques tend to be overconfident when the model underfits. We propose a general strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between ${\widehat{p}}{\theta}(Y|X)$ and $p(Y|X)$: train it to predict pairs of independent responses drawn from the true conditional distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. Remarkably, we prove that being good at cheating (i.e. cheating whenever it improves your prediction) is equivalent to being second-order calibrated, a principled extension of ordinary calibration that allows us to construct provably-correct frequentist confidence intervals for $p(Y|X)$ and detect incorrect responses with high probability. We demonstrate empirically that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) LLMing, and partially-observable navigation tasks, outperforming existing techniques.
- TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
- A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511, 2021.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Barber, R. F. Is distribution-free inference possible for binary regression? arXiv: Statistics Theory, 2020.
- Pitfalls of epistemic uncertainty quantification through loss minimisation. Advances in Neural Information Processing Systems, 35:29205–29216, 2022.
- On second-order scoring rules for epistemic uncertainty quantification. arXiv preprint arXiv:2301.12736, 2023.
- Completely positive matrices. World Scientific, 2003.
- Regression and classification using Gaussian process priors. Bayesian statistics, 6:475, 1998.
- It’s mbr all the way down: Modern generation techniques through the lens of minimum bayes risk. arXiv preprint arXiv:2310.01387, 2023.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- When does optimizing a proper loss yield calibration? ArXiv, abs/2305.18764, 2023.
- Cantelli, F. P. Sui confini della probabilita. In Atti del Congresso Internazionale dei Matematici: Bologna del 3 al 10 de settembre di 1928, pp. 47–60, 1929.
- The calibration generalization gap. ArXiv, abs/2210.01964, 2022.
- Chedzoy, O. B. Phi‐Coefficient. In Kotz, S., Read, C. B., Balakrishnan, N., and Vidakovic, B. (eds.), Encyclopedia of Statistical Sciences. Wiley, 2 edition, December 2005. ISBN 9780471150442 9780471667193. doi: 10.1002/0471667196.ess1960.pub2. URL https://onlinelibrary.wiley.com/doi/10.1002/0471667196.ess1960.pub2.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
- Universal self-consistency for large language model generation. ArXiv, abs/2311.17311, 2023.
- Chollet, F. et al. Keras. https://keras.io, 2015.
- Training verifiers to solve math word problems. ArXiv, abs/2110.14168, 2021.
- Transfer and marginalize: Explaining away label noise with privileged information. In International Conference on Machine Learning, pp. 4219–4237. PMLR, 2022.
- Dawid, A. P. The well-calibrated Bayesian. Journal of the American Statistical Association, 77(379):605–610, 1982.
- Dawid, A. P. Present position and potential developments: Some personal views statistical theory the prequential approach. Journal of the Royal Statistical Society: Series A (General), 147(2):278–290, 1984.
- Dawid, A. P. Calibration-based empirical probability. The Annals of Statistics, 13(4):1251–1274, 1985.
- The DeepMind JAX Ecosystem, 2020. URL http://github.com/google-deepmind.
- Ding, P. A first course in causal inference. arXiv preprint arXiv:2305.18793, 2023.
- Asymptotic calibration. Biometrika, 85(2):379–390, 1998.
- Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, 2015.
- Logical induction. arXiv preprint arXiv:1609.03543, 2016.
- Bayesian neural networks: An introduction and survey. ArXiv, abs/2006.12024, 2020.
- On calibration of modern neural networks. In International conference on machine learning, pp. 1321–1330. PMLR, 2017.
- Distribution-free calibration guarantees for histogram binning without sample splitting. In International Conference on Machine Learning, pp. 3942–3952. PMLR, 2021.
- Distribution-free binary classification: prediction sets, confidence intervals and calibration. Advances in Neural Information Processing Systems, 33:3711–3723, 2020.
- Multicalibration: Calibration for the (computationally-identifiable) masses. In International Conference on Machine Learning, 2018.
- Augmix: A simple data processing method to improve robustness and uncertainty. arXiv preprint arXiv:1912.02781, 2019.
- Scalable variational Gaussian process classification. In Artificial Intelligence and Statistics, pp. 351–360. PMLR, 2015.
- Hoeffding, W. Probability inequalities for sums of bounded random variables. The collected works of Wassily Hoeffding, pp. 409–426, 1994.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55:1 – 38, 2022.
- Language models (mostly) know what they know. ArXiv, abs/2207.05221, 2022.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
- Calibrated language models must hallucinate. ArXiv, abs/2311.14648, 2023.
- What uncertainties do we need in Bayesian deep learning for computer vision? ArXiv, abs/1703.04977, 2017.
- Krizhevsky, A. Learning multiple layers of features from tiny images. In Tech report, 2009.
- Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In The Eleventh International Conference on Learning Representations, 2022.
- Novel decompositions of proper scoring rules for classification: Score adjustment as precursor to calibration. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I 15, pp. 68–85. Springer, 2015.
- Verified uncertainty calibration. Advances in Neural Information Processing Systems, 32, 2019.
- DEUP: Direct epistemic uncertainty prediction. arXiv preprint arXiv:2102.08501, 2021.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In Neural Information Processing Systems, 2016.
- Competition-level code generation with AlphaCode. Science, 378:1092 – 1097, 2022a.
- Making language models better reasoners with step-aware verifier. In Annual Meeting of the Association for Computational Linguistics, 2022b.
- Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. ArXiv, abs/2006.10108, 2020.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- A simple baseline for Bayesian uncertainty in deep learning. In Neural Information Processing Systems, 2019.
- Predictive uncertainty estimation via prior networks. Advances in neural information processing systems, 31, 2018.
- Uncertainty estimation in autoregressive structured prediction. In International Conference on Learning Representations, 2020.
- Second order calibration: A simple way to get approximate posteriors. arXiv preprint arXiv:1510.08437, 2015.
- Uncertainty baselines: Benchmarks for uncertainty & robustness in deep learning. arXiv preprint arXiv:2106.04015, 2021.
- Collision probability matching loss for disentangling epistemic uncertainty from aleatoric uncertainty. In International Conference on Artificial Intelligence and Statistics, pp. 11355–11370. PMLR, 2023.
- LEVER: learning to verify language-to-code generation with execution. ArXiv, abs/2302.08468, 2023.
- Oakes, D. Self-calibrating priors do not exist. Journal of the American Statistical Association, 80:339–339, 1985.
- OpenAI. GPT-4 technical report, 2023.
- Shaking the foundations: delusions in sequence models for interaction and control. arXiv preprint arXiv:2110.10819, 2021.
- Epistemic neural networks. ArXiv, abs/2107.08924, 2021.
- PAC confidence predictions for deep neural network classifiers. ArXiv, abs/2011.00716, 2020.
- Beyond calibration: estimating the grouping loss of modern neural networks. arXiv preprint arXiv:2210.16315, 2022.
- Human uncertainty makes classification more robust. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9617–9626, 2019.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Classification. In Gaussian Processes for Machine Learning. The MIT Press, 11 2005. ISBN 9780262256834. doi: 10.7551/mitpress/3206.003.0006. URL https://doi.org/10.7551/mitpress/3206.003.0006.
- Calibration with many checking rules. Math. Oper. Res., 28:141–153, 2003.
- Is one annotation enough?-a data-centric image classification benchmark for noisy and ambiguous label estimation. Advances in Neural Information Processing Systems, 35:33215–33232, 2022.
- Equivalence between policy gradients and soft Q-learning. arXiv preprint arXiv:1704.06440, 2017.
- Evidential deep learning to quantify classification uncertainty. ArXiv, abs/1806.01768, 2018.
- Game-theoretic foundations for probability and finance, volume 455. John Wiley & Sons, 2019.
- Evaluating model calibration in classification. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 3459–3467. PMLR, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Nonparametric predictive distributions based on conformal prediction. In Conformal and probabilistic prediction and applications, pp. 82–102. PMLR, 2017.
- Self-consistency improves chain of thought reasoning in language models. ArXiv, abs/2203.11171, 2022.
- Robust asymmetric learning in POMDPs. In International Conference on Machine Learning, 2020.
- Estimating means of bounded random variables by betting. arXiv preprint arXiv:2010.09686, 2020.
- Learning with noisy labels revisited: A study using real-world human annotations. ArXiv, abs/2110.12088, 2021.
- From predictions to decisions: The importance of joint predictive distributions. arXiv preprint arXiv:2107.09224, 2021.
- On layer normalization in the transformer architecture. In International Conference on Machine Learning, pp. 10524–10533. PMLR, 2020.
- Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 694–699, 2002.
- Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
- Decoding methods in neural language generation: a survey. Information, 12(9):355, 2021.