2000 character limit reached
Joint control variate for faster black-box variational inference (2210.07290v4)
Published 13 Oct 2022 in cs.LG and stat.ML
Abstract: Black-box variational inference performance is sometimes hindered by the use of gradient estimators with high variance. This variance comes from two sources of randomness: Data subsampling and Monte Carlo sampling. While existing control variates only address Monte Carlo noise, and incremental gradient methods typically only address data subsampling, we propose a new "joint" control variate that jointly reduces variance from both sources of noise. This significantly reduces gradient variance, leading to faster optimization in several applications.
- Stochastic optimization with variance reduction for infinite datasets with finite sum structure. In NeurIPS, 2017.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
- Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
- Amortized variance reduction for doubly stochastic objective. In UAI. PMLR, 2020.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
- Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In NeurIPS, 2014a.
- Finito: A faster, permutable incremental gradient method for big data problems. In ICML. PMLR, 2014b.
- Using large ensembles of control variates for variational inference. In NeurIPS, 2018.
- Approximation based variance reduction for reparameterization gradients. In NeurIPS, 2020.
- Black box variational inference with a deterministic objective: Faster, more accurate, and even more black box. Journal of Machine Learning Research, 25(18):1–39, 2024.
- Variance-reduced methods for machine learning. Proceedings of the IEEE, 108(11):1968–1983, 2020.
- Backpropagation through the void: Optimizing control variates for black-box gradient estimation. In ICLR, 2018.
- The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS), 5(4):1–19, 2015.
- Stochastic variational inference. Journal of Machine Learning Research, 14(40):1303–1347, 2013.
- The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
- Accelerating stochastic gradient descent using predictive variance reduction. In NeurIPS, 2013.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Auto-encoding variational Bayes. In ICLR, 2014.
- Learning multiple layers of features from tiny images. Tech. report, 2009.
- Automatic differentiation variational inference. Journal of Machine Learning Research, 18(1):430–474, 2017.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Julien Mairal. Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2):829–855, 2015.
- Reducing reparameterization gradient variance. In NeurIPS, 2017.
- Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009.
- Variational bayesian inference with stochastic search. In ICML, 2012.
- Composable effects for flexible and accelerated probabilistic programming in NumPyro. arXiv preprint arXiv:1912.11554, 2019.
- Black box variational inference. In AISTATS. PMLR, 2014.
- Stochastic backpropagation and approximate inference in deep generative models. In ICML. PMLR, 2014.
- Monte Carlo statistical methods, volume 2. Springer, 1999.
- Sticking the landing: Simple, lower-variance gradient estimators for variational inference. In NeurIPS, 2017.
- A stochastic gradient method with an exponential convergence rate for finite training sets. In NeurIPS, 2012.
- Doubly stochastic variational inference for deep gaussian processes. In NeurIPS, 2017.
- Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 14(2), 2013.
- Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611–622, 1999.
- Doubly stochastic variational bayes for non-conjugate inference. In ICML. PMLR, 2014.
- Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. In NeurIPS, 2017.
- Variance reduction for stochastic gradient optimization. In NeurIPS, 2013.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Variance reduction properties of the reparameterization trick. In AISTATS. PMLR, 2019.
- Shuai Zheng and James Tin-Yau Kwok. Lightweight stochastic optimization for minimizing finite sums with infinite data. In ICML. PMLR, 2018.
- Xi Wang (275 papers)
- Tomas Geffner (19 papers)
- Justin Domke (39 papers)