Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Joint control variate for faster black-box variational inference (2210.07290v4)

Published 13 Oct 2022 in cs.LG and stat.ML

Abstract: Black-box variational inference performance is sometimes hindered by the use of gradient estimators with high variance. This variance comes from two sources of randomness: Data subsampling and Monte Carlo sampling. While existing control variates only address Monte Carlo noise, and incremental gradient methods typically only address data subsampling, we propose a new "joint" control variate that jointly reduces variance from both sources of noise. This significantly reduces gradient variance, leading to faster optimization in several applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Stochastic optimization with variance reduction for infinite datasets with finite sum structure. In NeurIPS, 2017.
  2. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
  3. Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
  4. Amortized variance reduction for doubly stochastic objective. In UAI. PMLR, 2020.
  5. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  6. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  7. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In NeurIPS, 2014a.
  8. Finito: A faster, permutable incremental gradient method for big data problems. In ICML. PMLR, 2014b.
  9. Using large ensembles of control variates for variational inference. In NeurIPS, 2018.
  10. Approximation based variance reduction for reparameterization gradients. In NeurIPS, 2020.
  11. Black box variational inference with a deterministic objective: Faster, more accurate, and even more black box. Journal of Machine Learning Research, 25(18):1–39, 2024.
  12. Variance-reduced methods for machine learning. Proceedings of the IEEE, 108(11):1968–1983, 2020.
  13. Backpropagation through the void: Optimizing control variates for black-box gradient estimation. In ICLR, 2018.
  14. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS), 5(4):1–19, 2015.
  15. Stochastic variational inference. Journal of Machine Learning Research, 14(40):1303–1347, 2013.
  16. The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
  17. Accelerating stochastic gradient descent using predictive variance reduction. In NeurIPS, 2013.
  18. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  19. Auto-encoding variational Bayes. In ICLR, 2014.
  20. Learning multiple layers of features from tiny images. Tech. report, 2009.
  21. Automatic differentiation variational inference. Journal of Machine Learning Research, 18(1):430–474, 2017.
  22. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  23. Julien Mairal. Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2):829–855, 2015.
  24. Reducing reparameterization gradient variance. In NeurIPS, 2017.
  25. Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009.
  26. Variational bayesian inference with stochastic search. In ICML, 2012.
  27. Composable effects for flexible and accelerated probabilistic programming in NumPyro. arXiv preprint arXiv:1912.11554, 2019.
  28. Black box variational inference. In AISTATS. PMLR, 2014.
  29. Stochastic backpropagation and approximate inference in deep generative models. In ICML. PMLR, 2014.
  30. Monte Carlo statistical methods, volume 2. Springer, 1999.
  31. Sticking the landing: Simple, lower-variance gradient estimators for variational inference. In NeurIPS, 2017.
  32. A stochastic gradient method with an exponential convergence rate for finite training sets. In NeurIPS, 2012.
  33. Doubly stochastic variational inference for deep gaussian processes. In NeurIPS, 2017.
  34. Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 14(2), 2013.
  35. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611–622, 1999.
  36. Doubly stochastic variational bayes for non-conjugate inference. In ICML. PMLR, 2014.
  37. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. In NeurIPS, 2017.
  38. Variance reduction for stochastic gradient optimization. In NeurIPS, 2013.
  39. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  40. Variance reduction properties of the reparameterization trick. In AISTATS. PMLR, 2019.
  41. Shuai Zheng and James Tin-Yau Kwok. Lightweight stochastic optimization for minimizing finite sums with infinite data. In ICML. PMLR, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Xi Wang (275 papers)
  2. Tomas Geffner (19 papers)
  3. Justin Domke (39 papers)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com