2000 character limit reached
Joint control variate for faster black-box variational inference (2210.07290v4)
Published 13 Oct 2022 in cs.LG and stat.ML
Abstract: Black-box variational inference performance is sometimes hindered by the use of gradient estimators with high variance. This variance comes from two sources of randomness: Data subsampling and Monte Carlo sampling. While existing control variates only address Monte Carlo noise, and incremental gradient methods typically only address data subsampling, we propose a new "joint" control variate that jointly reduces variance from both sources of noise. This significantly reduces gradient variance, leading to faster optimization in several applications.
- Stochastic optimization with variance reduction for infinite datasets with finite sum structure. In NeurIPS, 2017.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
- Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
- Amortized variance reduction for doubly stochastic objective. In UAI. PMLR, 2020.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
- Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In NeurIPS, 2014a.
- Finito: A faster, permutable incremental gradient method for big data problems. In ICML. PMLR, 2014b.
- Using large ensembles of control variates for variational inference. In NeurIPS, 2018.
- Approximation based variance reduction for reparameterization gradients. In NeurIPS, 2020.
- Black box variational inference with a deterministic objective: Faster, more accurate, and even more black box. Journal of Machine Learning Research, 25(18):1–39, 2024.
- Variance-reduced methods for machine learning. Proceedings of the IEEE, 108(11):1968–1983, 2020.
- Backpropagation through the void: Optimizing control variates for black-box gradient estimation. In ICLR, 2018.
- The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS), 5(4):1–19, 2015.
- Stochastic variational inference. Journal of Machine Learning Research, 14(40):1303–1347, 2013.
- The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
- Accelerating stochastic gradient descent using predictive variance reduction. In NeurIPS, 2013.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Auto-encoding variational Bayes. In ICLR, 2014.
- Learning multiple layers of features from tiny images. Tech. report, 2009.
- Automatic differentiation variational inference. Journal of Machine Learning Research, 18(1):430–474, 2017.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Julien Mairal. Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2):829–855, 2015.
- Reducing reparameterization gradient variance. In NeurIPS, 2017.
- Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009.
- Variational bayesian inference with stochastic search. In ICML, 2012.
- Composable effects for flexible and accelerated probabilistic programming in NumPyro. arXiv preprint arXiv:1912.11554, 2019.
- Black box variational inference. In AISTATS. PMLR, 2014.
- Stochastic backpropagation and approximate inference in deep generative models. In ICML. PMLR, 2014.
- Monte Carlo statistical methods, volume 2. Springer, 1999.
- Sticking the landing: Simple, lower-variance gradient estimators for variational inference. In NeurIPS, 2017.
- A stochastic gradient method with an exponential convergence rate for finite training sets. In NeurIPS, 2012.
- Doubly stochastic variational inference for deep gaussian processes. In NeurIPS, 2017.
- Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 14(2), 2013.
- Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611–622, 1999.
- Doubly stochastic variational bayes for non-conjugate inference. In ICML. PMLR, 2014.
- Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. In NeurIPS, 2017.
- Variance reduction for stochastic gradient optimization. In NeurIPS, 2013.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Variance reduction properties of the reparameterization trick. In AISTATS. PMLR, 2019.
- Shuai Zheng and James Tin-Yau Kwok. Lightweight stochastic optimization for minimizing finite sums with infinite data. In ICML. PMLR, 2018.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.