Emergent Mind

Sampling Algorithms and Coresets for Lp Regression

(0707.1714)
Published Jul 11, 2007 in cs.DS

Abstract

The Lp regression problem takes as input a matrix $A \in \Real{n \times d}$, a vector $b \in \Realn$, and a number $p \in [1,\infty)$, and it returns as output a number ${\cal Z}$ and a vector $x{opt} \in \Reald$ such that ${\cal Z} = \min{x \in \Reald} ||Ax -b||p = ||Ax{opt}-b||p$. In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained ($n \gg d$) version of this classical problem, for all $p \in [1, \infty)$. The first stage of our algorithm non-uniformly samples $\hat{r}1 = O(36p d{\max{p/2+1, p}+1})$ rows of $A$ and the corresponding elements of $b$, and then it solves the Lp regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resample $\hat{r}_1/\epsilon2$ constraints, and then it solves the Lp regression problem on the new sample; we prove this is a $(1+\epsilon)$-approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of Lp regression, namely $p = 1,2$. In course of proving our result, we develop two concepts--well-conditioned bases and subspace-preserving sampling--that are of independent interest.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.