Emergent Mind

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

(2110.14427)
Published Oct 27, 2021 in math.ST , cs.LG , and stat.TH

Abstract

The paper concerns the stochastic approximation recursion, [ \theta{n+1}= \thetan + \alpha{n + 1} f(\thetan, \Phi{n+1}) \,,\quad n\ge 0, ] where the {\em estimates} $\thetan\in\Red$ and $ { \Phin }$ is a Markov chain on a general state space. In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} $ \tfrac{d}{dt} \varthetat = \bar{f}(\varthetat)$, is globally asymptotically stable with stationary point denoted $\theta*$, where $\bar{f}(\theta)=\text{ E}[f(\theta,\Phi)]$ with $\Phi$ having the stationary distribution of the chain. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3) for the chain: (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L4$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\text{ E} [ zn znT ]$ to the asymptotic covariance $\Sigma\Theta$ in the CLT, where $zn= (\thetan-\theta*)/\sqrt{\alpha_n}$. (iii) The CLT holds for the normalized version $z{\text{ PR}}n$ of the averaged parameters $\theta{\text{ PR}}n$, subject to standard assumptions on the step-size. Moreover, the normalized covariance of both $\theta{\text{ PR}}n$ and $z{\text{ PR}}n$ converge to $\Sigma{\text{ PR}}$, the minimal covariance of Polyak and Ruppert. (iv)} An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment of $\theta_n$ is unbounded and in fact diverges.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.