Outlier Robust Multivariate Polynomial Regression (2403.09465v1)
Abstract: We study the problem of robust multivariate polynomial regression: let $p\colon\mathbb{R}n\to\mathbb{R}$ be an unknown $n$-variate polynomial of degree at most $d$ in each variable. We are given as input a set of random samples $(\mathbf{x}i,y_i) \in [-1,1]n \times \mathbb{R}$ that are noisy versions of $(\mathbf{x}_i,p(\mathbf{x}_i))$. More precisely, each $\mathbf{x}_i$ is sampled independently from some distribution $\chi$ on $[-1,1]n$, and for each $i$ independently, $y_i$ is arbitrary (i.e., an outlier) with probability at most $\rho < 1/2$, and otherwise satisfies $|y_i-p(\mathbf{x}_i)|\leq\sigma$. The goal is to output a polynomial $\hat{p}$, of degree at most $d$ in each variable, within an $\ell\infty$-distance of at most $O(\sigma)$ from $p$. Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$. We generalize their results to the $n$-variate setting, showing an algorithm that achieves a sample complexity of $O_n(dn\log d)$, where the hidden constant depends on $n$, if $\chi$ is the $n$-dimensional Chebyshev distribution. The sample complexity is $O_n(d{2n}\log d)$, if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most $O(\sigma)$, and the run-time depends on $\log(1/\sigma)$. In the setting where each $\mathbf{x}_i$ and $y_i$ are known up to $N$ bits of precision, the run-time's dependence on $N$ is linear. We also show that our sample complexities are optimal in terms of $dn$. Furthermore, we show that it is possible to have the run-time be independent of $1/\sigma$, at the cost of a higher sample complexity.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.