Outlier Robust Multivariate Polynomial Regression

Published Mar 14, 2024 in cs.DS , cs.LG , and stat.ML


We study the problem of robust multivariate polynomial regression: let $p\colon\mathbb{R}n\to\mathbb{R}$ be an unknown $n$-variate polynomial of degree at most $d$ in each variable. We are given as input a set of random samples $(\mathbf{x}i,yi) \in [-1,1]n \times \mathbb{R}$ that are noisy versions of $(\mathbf{x}i,p(\mathbf{x}i))$. More precisely, each $\mathbf{x}i$ is sampled independently from some distribution $\chi$ on $[-1,1]n$, and for each $i$ independently, $yi$ is arbitrary (i.e., an outlier) with probability at most $\rho < 1/2$, and otherwise satisfies $|yi-p(\mathbf{x}i)|\leq\sigma$. The goal is to output a polynomial $\hat{p}$, of degree at most $d$ in each variable, within an $\ell\infty$-distance of at most $O(\sigma)$ from $p$. Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$. We generalize their results to the $n$-variate setting, showing an algorithm that achieves a sample complexity of $On(dn\log d)$, where the hidden constant depends on $n$, if $\chi$ is the $n$-dimensional Chebyshev distribution. The sample complexity is $On(d{2n}\log d)$, if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most $O(\sigma)$, and the run-time depends on $\log(1/\sigma)$. In the setting where each $\mathbf{x}i$ and $y_i$ are known up to $N$ bits of precision, the run-time's dependence on $N$ is linear. We also show that our sample complexities are optimal in terms of $dn$. Furthermore, we show that it is possible to have the run-time be independent of $1/\sigma$, at the cost of a higher sample complexity.

