Emergent Mind

Perfect $L_p$ Sampling in a Data Stream

(1808.05497)
Published Aug 16, 2018 in cs.DS

Abstract

In this paper, we resolve the one-pass space complexity of $Lp$ sampling for $p \in (0,2)$. Given a stream of updates (insertions and deletions) to the coordinates of an underlying vector $f \in \mathbb{R}n$, a perfect $Lp$ sampler must output an index $i$ with probability $|fi|p/|f|pp$, and is allowed to fail with some probability $\delta$. So far, for $p > 0$ no algorithm has been shown to solve the problem exactly using $\text{poly}( \log n)$-bits of space. In 2010, Monemizadeh and Woodruff introduced an approximate $Lp$ sampler, which outputs $i$ with probability $(1 \pm \nu)|fi|p /|f|pp$, using space polynomial in $\nu{-1}$ and $\log(n)$. The space complexity was later reduced by Jowhari, Sa\u{g}lam, and Tardos to roughly $O(\nu{-p} \log2 n \log \delta{-1})$ for $p \in (0,2)$, which tightly matches the $\Omega(\log2 n \log \delta{-1})$ lower bound in terms of $n$ and $\delta$, but is loose in terms of $\nu$. Given these nearly tight bounds, it is perhaps surprising that no lower bound exists in terms of $\nu$not even a bound of $\Omega(\nu{-1})$ is known. In this paper, we explain this phenomenon by demonstrating the existence of an $O(\log2 n \log \delta{-1})$-bit perfect $Lp$ sampler for $p \in (0,2)$. This shows that $\nu$ need not factor into the space of an $L_p$ sampler, which closes the complexity of the problem for this range of $p$. For $p=2$, our bound is $O(\log3 n \log \delta{-1})$-bits, which matches the prior best known upper bound in terms of $n,\delta$, but has no dependence on $\nu$. For $p<2$, our bound holds in the random oracle model, matching the lower bounds in that model. Moreover, we show that our algorithm can be derandomized with only a $O((\log \log n)2)$ blow-up in the space (and no blow-up for $p=2$). Our derandomization technique is general, and can be used to derandomize a large class of linear sketches.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.