Non-convex Finite-Sum Optimization Via SCSG Methods
(1706.09156)Abstract
We develop a class of algorithms, as variants of the stochastically controlled stochastic gradient (SCSG) methods (Lei and Jordan, 2016), for the smooth non-convex finite-sum optimization problem. Assuming the smoothness of each component, the complexity of SCSG to reach a stationary point with $\mathbb{E} |\nabla f(x)|{2}\le \epsilon$ is $O\left (\min{\epsilon{-5/3}, \epsilon{-1}n{2/3}}\right)$, which strictly outperforms the stochastic gradient descent. Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low. A similar acceleration is also achieved when the functions satisfy the Polyak-Lojasiewicz condition. Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.
We're not able to analyze this paper right now due to high demand.
Please check back later (sorry!).
Generate a summary of this paper on our Pro plan:
We ran into a problem analyzing this paper.