Convergence to minima for the continuous version of Backtracking Gradient Descent (1911.04221v2)
Abstract: The main result of this paper is: {\bf Theorem.} Let $f:\mathbb{R}k\rightarrow \mathbb{R}$ be a $C{1}$ function, so that $\nabla f$ is locally Lipschitz continuous. Assume moreover that $f$ is $C2$ near its generalised saddle points. Fix real numbers $\delta_0>0$ and $0<\alpha <1$. Then there is a smooth function $h:\mathbb{R}k\rightarrow (0,\delta_0]$ so that the map $H:\mathbb{R}k\rightarrow \mathbb{R}k$ defined by $H(x)=x-h(x)\nabla f(x)$ has the following property: (i) For all $x\in \mathbb{R}k$, we have $f(H(x)))-f(x)\leq -\alpha h(x)||\nabla f(x)||2$. (ii) For every $x_0\in \mathbb{R}k$, the sequence $x_{n+1}=H(x_n)$ either satisfies $\lim_{n\rightarrow\infty}||x_{n+1}-x_n||=0$ or $ \lim_{n\rightarrow\infty}||x_n||=\infty$. Each cluster point of ${x_n}$ is a critical point of $f$. If moreover $f$ has at most countably many critical points, then ${x_n}$ either converges to a critical point of $f$ or $\lim_{n\rightarrow\infty}||x_n||=\infty$. (iii) There is a set $\mathcal{E}1\subset \mathbb{R}k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}k\backslash \mathcal{E}_1$, the sequence $x{n+1}=H(x_n)$, {\bf if converges}, cannot converge to a {\bf generalised} saddle point. (iv) There is a set $\mathcal{E}2\subset \mathbb{R}k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}k\backslash \mathcal{E}_2$, any cluster point of the sequence $x{n+1}=H(x_n)$ is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.