The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning (2110.14427v6)
Abstract: The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ where $ { \Phi_n }$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3): (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\textsf{E}[ z_n z_nT ]$ to the asymptotic covariance in the CLT, where $z_n =: (\theta_n-\theta*)/\sqrt{\alpha_n}$. (iii) The CLT holds for the normalized version $z{\text{PR}}_n =: \sqrt{n} [\theta{\text{PR}}_n -\theta*]$, of the averaged parameters $\theta{\text{PR}}_n =:n{-1} \sum_{k=1}n\theta_k$, subject to standard assumptions on the step-size. Moreover, the covariance in the CLT coincides with the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and $\Phi$ is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of $\theta_n$ is unbounded and in fact diverges. This arXiv version represents a major extension of the results in prior versions.The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.
- F. Bach. Learning Theory from First Principles. In Preparation, 2021.
- F. Bach and E. Moulines. Non-strongly-convex smooth stochastic approximation with convergence rate o(1/n)𝑜1𝑛o(1/n)italic_o ( 1 / italic_n ). In Proc. Advances in Neural Information Processing Systems, volume 26, pages 773–781, 2013.
- Adaptive algorithms and stochastic approximations, volume 22. Springer Science & Business Media, Berlin Heidelberg, 2012.
- S. Bhatnagar. The Borkar–Meyn Theorem for asynchronous stochastic approximations. Systems & control letters, 60(7):472–478, 2011.
- P. Billingsley. Convergence of Probability Measures. John Wiley & Sons, New York, 1968.
- P. Billingsley. Probability and Measure. John Wiley & Sons, New York, 1995.
- V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Hindustan Book Agency, Delhi, India, 2nd edition, 2021.
- The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447–469, 2000.
- K. L. Chung. On a stochastic approximation method. The Annals of Mathematical Statistics, 25(3):463–483, 1954.
- J. G. Dai. On positive Harris recurrence of multiclass queueing networks: a unified approach via fluid limit models. Ann. Appl. Probab., 5(1):49–77, 1995.
- Stability and convergence of moments for multiclass queueing networks via fluid limit models. IEEE Trans. Automat. Control, 40:1889–1904, 1995.
- Geometric ergodicity in a weighted Sobolev space. Ann. Probab., 48(1):380–403, 2020.
- M. Donsker and S. Varadhan. Asymptotic evaluation of certain Markov process expectations for large time. I. II. Comm. Pure Appl. Math., 28:1–47; ibid. 28 (1975), 279–301, 1975.
- M. Donsker and S. Varadhan. Asymptotic evaluation of certain Markov process expectations for large time. III. Comm. Pure Appl. Math., 29(4):389–461, 1976.
- J. L. Doob. Stochastic Processes. John Wiley & Sons, New York, 1953.
- K. Duffy and S. Meyn. Large deviation asymptotics for busy periods. Stochastic Systems, 4(1):300–319, 2014.
- Tight high probability bounds for linear stochastic approximation with fixed step-size. Advances in Neural Information Processing Systems and arXiv:2106.01257, 34:30063–30074, 2021.
- On the stability of random matrix product with Markovian noise: Application to linear stochastic approximation and TD learning. In Conference on Learning Theory, pages 1711–1752. PMLR, 2021.
- Markov processes: characterization and convergence, volume 282. John Wiley & Sons, 2005.
- V. Fabian et al. On asymptotic normality in stochastic approximation. The Annals of Mathematical Statistics, 39(4):1327–1332, 1968.
- ODE methods for skip-free Markov chain stability with applications to MCMC. Ann. Appl. Probab., 18(2):664–707, 2008.
- A. Ganesh and N. O’Connell. A large deviation principle with queueing applications. Stochastics and Stochastic Reports, 73(1-2):25–35, 2002.
- V. Gaposkin and T. Krasulina. On the law of the iterated logarithm in stochastic approximation processes. Theory of Probability & Its Applications, 19(4):844–850, 1975.
- L. Gerencsér. Rate of convergence of recursive estimators. SIAM Journal on Control and Optimization, 30(5):1200–1227, 1992.
- A Liapounov bound for solutions of the Poisson equation. Ann. Probab., 24(2):916–931, 1996.
- The ODE method and spectral theory of Markov operators. In T. E. Duncan and B. Pasik-Duncan, editors, Proc. of the workshop held at the University of Kansas, Lawrence, Kansas, October 18–20, 2001, volume 280 of Lecture Notes in Control and Information Sciences, pages 205–222, Berlin, 2002. Springer-Verlag.
- Law of the iterated logarithm for a constant-gain linear stochastic gradient algorithm. SIAM Journal on Control and Optimization, 39(2):533–570, 2000.
- Finite time analysis of linear two-timescale stochastic approximation with Markovian noise. arXiv e-prints, page arXiv:2002.01268, Feb. 2020.
- Non-asymptotic analysis of biased stochastic approximation scheme. In Conference on Learning Theory, pages 1944–1974. PMLR, 2019.
- P. Karmakar and S. Bhatnagar. Two time-scale stochastic approximation with controlled Markov noise and off-policy temporal-difference learning. Math. Oper. Res., 43(1):130–151, 2018.
- V. Konda. Actor-critic algorithms. PhD thesis, Massachusetts Institute of Technology, 2002.
- I. Kontoyiannis and S. P. Meyn. Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab., 13:304–362, 2003.
- I. Kontoyiannis and S. P. Meyn. Large deviations asymptotics and the spectral theory of multiplicatively regular Markov processes. Electron. J. Probab., 10(3):61–123 (electronic), 2005.
- Stochastic approximation algorithms and applications, volume 35 of Applications of Mathematics (New York). Springer-Verlag, New York, 1997.
- C. K. Lauand and S. Meyn. Markovian foundations for quasi stochastic approximation with applications to extremum seeking control. arXiv 2207.06371, 2022.
- C. K. Lauand and S. Meyn. The curse of memory in stochastic approximation. In Proc. IEEE Conference on Decision and Control, pages 7803–7809, 2023.
- M. Metivier and P. Priouret. Theoremes de convergence presque sure pour une classe d’algorithmes stochastiques a pas decroissants. Prob. Theory Related Fields, 74:403–428, 1987.
- S. Meyn. Control Systems and Reinforcement Learning. Cambridge University Press, Cambridge, 2022.
- S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, 2007. Pre-publication edition available online.
- Markov chains and stochastic stability. Cambridge University Press, Cambridge, second edition, 2009. Published in the Cambridge Mathematical Library. 1993 edition online.
- A. Mokkadem and M. Pelletier. The compact law of the iterated logarithm for multivariate stochastic approximation algorithms. Stochastic analysis and applications, 23(1):181–203, 2005.
- E. Moulines and F. R. Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Advances in Neural Information Processing Systems 24, pages 451–459, 2011.
- E. Nummelin. General Irreducible Markov Chains and Nonnegative Operators. Cambridge University Press, Cambridge, 1984.
- Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation. In F. Ruiz, J. Dy, and J.-W. van de Meent, editors, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 5438–5448. PMLR, 25–27 Apr 2023.
- H. Pezeshki-Esfahani and A. Heunis. Strong diffusion approximations for recursive stochastic algorithms. IEEE Trans. Inform. Theory, 43(2):512–523, Mar 1997.
- B. T. Polyak. A new method of stochastic approximation type. Avtomatika i telemekhanika (in Russian). translated in Automat. Remote Control, 51 (1991), pages 98–107, 1990.
- Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30(4):838–855, 1992.
- Q. Qin and J. P. Hobert. Geometric convergence bounds for Markov chains in Wasserstein distance based on generalized drift and contraction conditions. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, volume 58, pages 872–889, 2022.
- A. Ramaswamy and S. Bhatnagar. A generalization of the Borkar-Meyn Theorem for stochastic recursive inclusions. Mathematics of Operations Research, 42(3):648–661, 2017.
- A. Ramaswamy and S. Bhatnagar. Stability of stochastic approximations with ‘controlled Markov’ noise and temporal difference learning. Trans. on Automatic Control, 64:2614–2620, 2019.
- H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951.
- D. Ruppert. A Newton-Raphson version of the multivariate Robbins-Monro procedure. The Annals of Statistics, 13(1):236–245, 1985.
- D. Ruppert. Efficient estimators from a slowly convergent Robbins-Monro processes. Technical Report Tech. Rept. No. 781, Cornell University, School of Operations Research and Industrial Engineering, Ithaca, NY, 1988.
- J. Sacks. Asymptotic distribution of stochastic approximation procedures. The Annals of Mathematical Statistics, 29(2):373–405, 1958.
- R. Srikant and L. Ying. Finite-time error bounds for linear stochastic approximation and TD learning. In Conference on Learning Theory, pages 2803–2830, 2019.
- R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 2nd edition, 2018.
- C. Szepesvári. The asymptotic convergence-rate of Q-learning. In Proc. of the Intl. Conference on Neural Information Processing Systems, pages 1064–1070, Cambridge, MA, 1997.
- M. Vidyasagar. A new converse Lyapunov theorem for global exponential stability and applications to stochastic approximation. In IEEE Trans. Automat. Control, pages 2319–2321. IEEE, 2022. Extended version on arXiv:2205.01303.
- M. Vidyasagar. Convergence of stochastic approximation via martingale and converse Lyapunov methods. Mathematics of Control, Signals, and Systems, pages 1–24, 2023.
- F. Zarin Faizal and V. Borkar. Functional Central Limit Theorem for Two Timescale Stochastic Approximation. arXiv e-prints, page arXiv:2306.05723, June 2023.
- Y. Zhu. Asymptotic normality for a vector stochastic difference equation with applications in stochastic approximation. Journal of Multivariate Analysis, 57(1):101–118, 1996.