Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems (2207.14030v2)

Published 28 Jul 2022 in cs.LG, cs.CC, math.ST, stat.ML, and stat.TH

Abstract: We show hardness of improperly learning halfspaces in the agnostic model, both in the distribution-independent as well as the distribution-specific setting, based on the assumption that worst-case lattice problems, such as GapSVP or SIVP, are hard. In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than $\frac 1 2 - \gamma$ even if the optimal misclassification error is as small is as small as $\delta$. Here, $\gamma$ can be smaller than the inverse of any polynomial in the dimension and $\delta$ as small as $exp(-\Omega(\log^{1-c}(d)))$, where $0 < c < 1$ is an arbitrary constant and $d$ is the dimension. For the distribution-specific setting, we show that if the marginal distribution is standard Gaussian, for any $\beta > 0$ learning halfspaces up to error $OPT_{LTF} + \epsilon$ takes time at least $d^{{\tilde{\Omega}(1/\epsilon^{2-\beta})}$} under the same hardness assumptions. Similarly, we show that learning degree-$\ell$ polynomial threshold functions up to error $OPT_{{PTF}\ell} + \epsilon$ takes time at least $d^{{\tilde{\Omega}(\ell^{{2-\beta}/\epsilon^{{2-\beta})}$.}}} $OPT{LTF}$ and $OPT_{{PTF}_\ell}$ denote the best error achievable by any halfspace or polynomial threshold function, respectively. Our lower bounds qualitively match algorithmic guarantees and (nearly) recover known lower bounds based on non-worst-case assumptions. Previously, such hardness results [Daniely16, DKPZ21] were based on average-case complexity assumptions or restricted to the statistical query model. Our work gives the first hardness results basing these fundamental learning problems on worst-case complexity assumptions. It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.