Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity
(2208.03811)Abstract
Many fundamental problems in machine learning can be formulated by the convex program [ \min{\theta\in Rd}\ \sum{i=1}{n}f_{i}(\theta), ] where each $fi$ is a convex, Lipschitz function supported on a subset of $di$ coordinates of $\theta$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $fi$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $fi$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $\epsilon$-accuracy in $\widetilde{O}(\sum{i=1}n di \log (1 /\epsilon))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/\epsilon))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.
We're not able to analyze this paper right now due to high demand.
Please check back later (sorry!).
Generate a summary of this paper on our Pro plan:
We ran into a problem analyzing this paper.