Tractability from overparametrization: The example of the negative perceptron
(2110.15824)Abstract
In the negative perceptron problem we are given $n$ data points $({\boldsymbol x}i,yi)$, where ${\boldsymbol x}i$ is a $d$-dimensional vector and $yi\in{+1,-1}$ is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible \emph{negative} margin. In other words, we want to find a unit norm vector ${\boldsymbol \theta}$ that maximizes $\min{i\le n}yi\langle {\boldsymbol \theta},{\boldsymbol x}i\rangle$. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which $n,d\to \infty$ with $n/d\to\delta$, and prove upper and lower bounds on the maximum margin $\kappa{\text{s}}(\delta)$ or -- equivalently -- on its inverse function $\delta{\text{s}}(\kappa)$. In other words, $\delta{\text{s}}(\kappa)$ is the overparametrization threshold: for $n/d\le \delta{\text{s}}(\kappa)-\varepsilon$ a classifier achieving vanishing training error exists with high probability, while for $n/d\ge \delta{\text{s}}(\kappa)+\varepsilon$ it does not. Our bounds on $\delta{\text{s}}(\kappa)$ match to the leading order as $\kappa\to -\infty$. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold $\delta{\text{lin}}(\kappa)$. We observe a gap between the interpolation threshold $\delta{\text{s}}(\kappa)$ and the linear programming threshold $\delta{\text{lin}}(\kappa)$, raising the question of the behavior of other algorithms.
We're not able to analyze this paper right now due to high demand.
Please check back later (sorry!).
Generate a summary of this paper on our Pro plan:
We ran into a problem analyzing this paper.