Sparse Regression via Range Counting (1908.00351v2)
Abstract: The sparse regression problem, also known as best subset selection problem, can be cast as follows: Given a set $S$ of $n$ points in $\mathbb{R}d$, a point $y\in \mathbb{R}d$, and an integer $2 \leq k \leq d$, find an affine combination of at most $k$ points of $S$ that is nearest to $y$. We describe a $O(n{k-1} \log{d-k+2} n)$-time randomized $(1+\varepsilon)$-approximation algorithm for this problem with (d) and (\varepsilon) constant. This is the first algorithm for this problem running in time $o(nk)$. Its running time is similar to the query time of a data structure recently proposed by Har-Peled, Indyk, and Mahabadi (ICALP'18), while not requiring any preprocessing. Up to polylogarithmic factors, it matches a conditional lower bound relying on a conjecture about affine degeneracy testing. In the special case where $k = d = O(1)$, we also provide a simple $O_\delta(n{d-1+\delta})$-time deterministic exact algorithm, for any (\delta > 0). Finally, we show how to adapt the approximation algorithm for the sparse linear regression and sparse convex regression problems with the same running time, up to polylogarithmic factors.