Emergent Mind

Online Lewis Weight Sampling

(2207.08268)
Published Jul 17, 2022 in cs.DS , cs.LG , and stat.ML

Abstract

The seminal work of Cohen and Peng introduced Lewis weight sampling to the theoretical computer science community, yielding fast row sampling algorithms for approximating $d$-dimensional subspaces of $\ellp$ up to $(1+\epsilon)$ error. Several works have extended this important primitive to other settings, including the online coreset and sliding window models. However, these results are only for $p\in{1,2}$, and results for $p=1$ require a suboptimal $\tilde O(d2/\epsilon2)$ samples. In this work, we design the first nearly optimal $\ellp$ subspace embeddings for all $p\in(0,\infty)$ in the online coreset and sliding window models. In both models, our algorithms store $\tilde O(d{1\lor(p/2)}/\epsilon2)$ rows. This answers a substantial generalization of the main open question of [BDMMUWZ2020], and gives the first results for all $p\notin{1,2}$. Towards our result, we give the first analysis of "one-shot'' Lewis weight sampling of sampling rows proportionally to their Lewis weights, with sample complexity $\tilde O(d{p/2}/\epsilon2)$ for $p>2$. Previously, this scheme was only known to have sample complexity $\tilde O(d{p/2}/\epsilon5)$, whereas $\tilde O(d{p/2}/\epsilon2)$ is known if a more sophisticated recursive sampling is used. The recursive sampling cannot be implemented online, thus necessitating an analysis of one-shot Lewis weight sampling. Our analysis uses a novel connection to online numerical linear algebra. As an application, we obtain the first one-pass streaming coreset algorithms for $(1+\epsilon)$ approximation of important generalized linear models, such as logistic regression and $p$-probit regression. Our upper bounds are parameterized by a complexity parameter $\mu$ introduced by [MSSW2018], and we show the first lower bounds showing that a linear dependence on $\mu$ is necessary.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.