User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient (1710.00095v4)
Abstract: In this paper, we study the problem of sampling from a given probability density function that is known to be smooth and strongly log-concave. We analyze several methods of approximate sampling based on discretizations of the (highly overdamped) Langevin diffusion and establish guarantees on its error measured in the Wasserstein-2 distance. Our guarantees improve or extend the state-of-the-art results in three directions. First, we provide an upper bound on the error of the first-order Langevin Monte Carlo (LMC) algorithm with optimized varying step-size. This result has the advantage of being horizon free (we do not need to know in advance the target precision) and to improve by a logarithmic factor the corresponding result for the constant step-size. Second, we study the case where accurate evaluations of the gradient of the log-density are unavailable, but one can have access to approximations of the aforementioned gradient. In such a situation, we consider both deterministic and stochastic approximations of the gradient and provide an upper bound on the sampling error of the first-order LMC that quantifies the impact of the gradient evaluation inaccuracies. Third, we establish upper bounds for two versions of the second-order LMC, which leverage the Hessian of the log-density. We provide nonasymptotic guarantees on the sampling error of these second-order LMCs. These guarantees reveal that the second-order LMC algorithms improve on the first-order LMC in ill-conditioned settings.
- Pathwise optimal transport bounds between a one-dimensional diffusion and its euler scheme. Ann. Appl. Probab., 24(3):1049–1080.
- Optimal transport bounds between the time-marginals of a multidimensional diffusion and its euler scheme. Electron. J. Probab., 20:31 pp.
- Noisy monte carlo: convergence of markov chains with approximate transition kernels. Statistics and Computing, 26(1):29–47.
- Sampling normalizing constants in high dimensions using inhomogeneous diffusions. ArXiv e-prints.
- Bhattacharya, R. N. (1978). Criteria for recurrence and existence of invariant measures for multidimensional diffusions. Ann. Probab., 6(4):541–553.
- Normalizing constants of log-concave densities. ArXiv e-prints.
- Sampling from a log-concave distribution with compact support with proximal langevin monte carlo. In Kale, S. and Shamir, O., editors, Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 319–342.
- Sampling from a log-concave distribution with projected langevin monte carlo. Discrete & Computational Geometry, 59(4):757–783.
- On the convergence of stochastic gradient mcmc algorithms with high-order integrators. In Advances in Neural Information Processing Systems, pages 2278–2286.
- Convergence of Langevin MCMC in KL-divergence. In Proceedings of ALT2018.
- Underdamped Langevin MCMC: A non-asymptotic analysis. ArXiv e-prints.
- An Introduction to Optimization. Wiley Series in Discrete Mathematics and Optimization. Wiley.
- Dalalyan, A. (2017a). Further and stronger analogy between sampling and optimization: Langevin monte carlo and gradient descent. In Kale, S. and Shamir, O., editors, Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 678–689.
- Dalalyan, A. S. (2017b). Theoretical guarantees for approximate sampling from a smooth and log-concave density. J. R. Stat. Soc. B, 79:651–676.
- High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm. ArXiv e-prints.
- Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab., 27(3):1551–1587.
- Efficient Bayesian Computation by Proximal Markov Chain Monte Carlo: When Langevin Meets Moreau. SIAM Journal on Imaging Sciences, 11(1).
- Griewank, A. (1993). Some bounds on the complexity of gradients, jacobians, and hessians. In Pardalos, P., editor, Complexity in Nonlinear Optimization, pages 128–161. World Scientific publishers.
- Quantifying the accuracy of approximate diffusions and Markov chains. In Singh, A. and Zhu, J., editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 382–391, Fort Lauderdale, FL, USA. PMLR.
- Geometric ergodicity of Metropolis algorithms. Stochastic Process. Appl., 85(2):341–361.
- Sampling from non-smooth distribution through Langevin diffusion. working paper or preprint.
- Sampling can be faster than optimization. arXiv preprint arXiv:1811.08413.
- The True Cost of Stochastic Gradient Langevin Dynamics. ArXiv e-prints.
- Nesterov, Y. (2004). Introductory lectures on convex optimization, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA.
- Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Kale, S. and Shamir, O., editors, Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 1674–1703.
- Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. B Stat. Methodol., 60(1):255–268.
- Langevin diffusions and Metropolis-Hastings algorithms. Methodol. Comput. Appl. Probab., 4(4):337–357 (2003).
- Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363.
- Rudin, W. (1987). Real and complex analysis. McGraw-Hill Book Co., New York, third edition.
- Langevin-type models. I. Diffusions with given stationary distributions and their discretizations. Methodol. Comput. Appl. Probab., 1(3):283–306.
- Langevin-type models. II. Self-targeting candidates for MCMC algorithms. Methodol. Comput. Appl. Probab., 1(3):307–328.
- Consistency and fluctuations for stochastic gradient langevin dynamics. Journal of Machine Learning Research, 17(7):1–33.
- (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics. ArXiv e-prints.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 681–688.
- A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47):E7351–E7358.
- Global convergence of langevin dynamics based algorithms for nonconvex optimization. Advances in Neural Information Processing Systems, pages 3126–3137.
- Arnak S. Dalalyan (26 papers)
- Avetik G. Karagulyan (1 paper)