Convergence to minima for the continuous version of Backtracking Gradient Descent (1911.04221v2)
Abstract: The main result of this paper is: {\bf Theorem.} Let $f:\mathbb{R}k\rightarrow \mathbb{R}$ be a $C{1}$ function, so that $\nabla f$ is locally Lipschitz continuous. Assume moreover that $f$ is $C2$ near its generalised saddle points. Fix real numbers $\delta_0>0$ and $0<\alpha <1$. Then there is a smooth function $h:\mathbb{R}k\rightarrow (0,\delta_0]$ so that the map $H:\mathbb{R}k\rightarrow \mathbb{R}k$ defined by $H(x)=x-h(x)\nabla f(x)$ has the following property: (i) For all $x\in \mathbb{R}k$, we have $f(H(x)))-f(x)\leq -\alpha h(x)||\nabla f(x)||2$. (ii) For every $x_0\in \mathbb{R}k$, the sequence $x_{n+1}=H(x_n)$ either satisfies $\lim_{n\rightarrow\infty}||x_{n+1}-x_n||=0$ or $ \lim_{n\rightarrow\infty}||x_n||=\infty$. Each cluster point of ${x_n}$ is a critical point of $f$. If moreover $f$ has at most countably many critical points, then ${x_n}$ either converges to a critical point of $f$ or $\lim_{n\rightarrow\infty}||x_n||=\infty$. (iii) There is a set $\mathcal{E}1\subset \mathbb{R}k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}k\backslash \mathcal{E}_1$, the sequence $x{n+1}=H(x_n)$, {\bf if converges}, cannot converge to a {\bf generalised} saddle point. (iv) There is a set $\mathcal{E}2\subset \mathbb{R}k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}k\backslash \mathcal{E}_2$, any cluster point of the sequence $x{n+1}=H(x_n)$ is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.