AdamMCMC: Combining Metropolis Adjusted Langevin with Momentum-based Optimization (2312.14027v3)
Abstract: Uncertainty estimation is a key issue when considering the application of deep neural network methods in science and engineering. In this work, we introduce a novel algorithm that quantifies epistemic uncertainty via Monte Carlo sampling from a tempered posterior distribution. It combines the well established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based optimization using Adam and leverages a prolate proposal distribution, to efficiently draw from the posterior. We prove that the constructed chain admits the Gibbs posterior as invariant distribution and approximates this posterior in total variation distance. Furthermore, we demonstrate the efficiency of the resulting algorithm and the merit of the proposed changes on a state-of-the-art classifier from high-energy particle physics.
- Structured stochastic gradient MCMC. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research (pp. 414–434).: PMLR.
- Alquier, P. (2021). User-friendly introduction to PAC-Bayes bounds. arXiv preprint arXiv:2110.11216.
- Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In Proceedings of the 31th International Conference on Machine Learning, volume 32 of JMLR Workshop and Conference Proceedings (pp. 181–189).
- Towards scaling up Markov chain Monte Carlo: An adaptive subsampling approach. In Proceedings of the 31th International Conference on Machine Learning, volume 32 of JMLR Workshop and Conference Proceedings (pp. 405–413).
- Statistical guarantees for stochastic Metropolis-Hastings. arXiv preprint arXiv:2310.09335.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.
- Weight uncertainty in neural network. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research (pp. 1613–1622). Lille, France: PMLR.
- Improving convergence of the Hastings-Metropolis algorithm with an adaptive proposal. Scandinavian Journal of Statistics, 29(1), 13–29.
- Bridging the gap between stochastic gradient MCMC and stochastic optimization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, volume 51 of JMLR Workshop and Conference Proceedings (pp. 1051–1060).
- Stochastic gradient Hamiltonian Monte Carlo. In Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research (pp. 1683–1691). Bejing, China: PMLR.
- Underdamped Langevin MCMC: A non-asymptotic analysis. In Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research (pp. 300–323).: PMLR.
- Variational MCMC. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, UAI’01 (pp. 120–127). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- Non-convex learning via replica exchange stochastic gradient MCMC. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research (pp. 2474–2483).
- Interacting contour stochastic gradient Langevin dynamics. In The Tenth International Conference on Learning Representations.
- A contour stochastic gradient Langevin dynamics algorithm for simulations of multi-modal distributions. In Advances in Neural Information Processing Systems, volume 33 (pp. 15725–15736).: Curran Associates, Inc.
- The total variation distance between high-dimensional Gaussians with the same mean. arXiv preprint arXiv:1810.08693.
- Bayesian sampling using stochastic gradient thermostats. In Advances in Neural Information Processing Systems, volume 27 (pp. 3203–3211).: Curran Associates, Inc.
- Uncertainty quantification for nonparametric regression using empirical Bayesian neural networks. arXiv preprint arXiv:2204.12735.
- Gal, Y. (2016). Uncertainty in Deep Learning. PhD thesis, University of Cambridge.
- Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research (pp. 1050–1059). New York, New York, USA: PMLR.
- Auxiliary variational MCMC. In The 7th International Conference on Learning Representations.
- Learning long-range vision for autonomous off-road driving. Journal of Field Robotics, 26(2), 120–144.
- Bayesian inference for large scale image classification. CoRR, abs/1908.03491.
- What are Bayesian neural network posteriors really like? In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research (pp. 4629–4640).: PMLR.
- Machine learning in the search for new fundamental physics. Nature Reviews Physics, 4(6), 399–412.
- Data subsampling for Bayesian neural networks. CoRR.
- Adam: A method for stochastic optimization. In The 3rd International Conference on Learning Representations.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, volume 30: Curran Associates, Inc.
- Preconditioned stochastic gradient Langevin dynamics for deep neural networks. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (pp. 1788–1794).: AAAI Press.
- Mini-batch tempered MCMC. arXiv preprint arXiv:1707.09705.
- A complete recipe for stochastic gradient MCMC. In Advances in Neural Information Processing Systems, volume 28: Curran Associates, Inc.
- International evaluation of an AI system for breast cancer screening. Nature, 577(7788), 89–94.
- Laplacian autoencoders for learning stochastic representations. In Advances in Neural Information Processing Systems, volume 35 (pp. 21059–21072).: Curran Associates, Inc.
- Stochastic gradient Riemannian Langevin dynamics on the probability simplex. In Advances in Neural Information Processing Systems, volume 26: Curran Associates, Inc.
- A scalable Laplace approximation for neural networks. In 6th International Conference on Learning Representations.
- Monte Carlo Statistical Methods. Springer, second edition.
- Exponential convergence of of Langevin distributions and their discrete approximations. Bernoulli, 2(4), 341–363.
- Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika, 83(1), 95–110.
- An efficient minibatch acceptance test for Metropolis-Hastings. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (pp. 5359–5363).
- Mini-batch Metropolis–Hastings with reversible SGLD proposal. Journal of the American Statistical Association, 117(537), 386–394.
- Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (pp. 681–688).: Omnipress.
- Advances in variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 2008–2026.
- Cyclical stochastic gradient MCMC for Bayesian deep learning. In The 8th International Conference on Learning Representations.
- Sebastian Bieringer (6 papers)
- Gregor Kasieczka (71 papers)
- Maximilian F. Steffen (5 papers)
- Mathias Trabs (30 papers)