Emergent Mind

Heavy hitters via cluster-preserving clustering

(1604.01357)
Published Apr 5, 2016 in cs.DS and cs.LG

Abstract

In turnstile $\ellp$ $\varepsilon$-heavy hitters, one maintains a high-dimensional $x\in\mathbb{R}n$ subject to $\texttt{update}(i,\Delta)$ causing $xi\leftarrow xi + \Delta$, where $i\in[n]$, $\Delta\in\mathbb{R}$. Upon receiving a query, the goal is to report a small list $L\subset[n]$, $|L| = O(1/\varepsilonp)$, containing every "heavy hitter" $i\in[n]$ with $|xi| \ge \varepsilon |x{\overline{1/\varepsilonp}}|p$, where $x{\overline{k}}$ denotes the vector obtained by zeroing out the largest $k$ entries of $x$ in magnitude. For any $p\in(0,2]$ the CountSketch solves $\ellp$ heavy hitters using $O(\varepsilon{-p}\log n)$ words of space with $O(\log n)$ update time, $O(n\log n)$ query time to output $L$, and whose output after any query is correct with high probability (whp) $1 - 1/poly(n)$. Unfortunately the query time is very slow. To remedy this, the work [CM05] proposed for $p=1$ in the strict turnstile model, a whp correct algorithm achieving suboptimal space $O(\varepsilon{-1}\log2 n)$, worse update time $O(\log2 n)$, but much better query time $O(\varepsilon{-1}poly(\log n))$. We show this tradeoff between space and update time versus query time is unnecessary. We provide a new algorithm, ExpanderSketch, which in the most general turnstile model achieves optimal $O(\varepsilon{-p}\log n)$ space, $O(\log n)$ update time, and fast $O(\varepsilon{-p}poly(\log n))$ query time, and whp correctness. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We then develop a "cluster-preserving clustering" algorithm, partitioning the graph into clusters without destroying any original cluster.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.