Frequency Estimation with One-Sided Error (2111.03953v1)
Abstract: Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream $S$ of elements from some universe $U={1 \ldots n}$, the goal is to compute, in a single pass, a short sketch of $S$ so that for any element $i \in U$, one can estimate the number $x_i$ of times $i$ occurs in $S$ based on the sketch alone. Two state of the art solutions to this problems are the Count-Min and Count-Sketch algorithms. The frequency estimator $\tilde{x}$ produced by Count-Min, using $O(1/\varepsilon \cdot \log n)$ dimensions, guarantees that $|\tilde{x}-x|{\infty} \le \varepsilon |x|_1$ with high probability, and $\tilde{x} \ge x$ holds deterministically. Also, Count-Min works under the assumption that $x \ge 0$. On the other hand, Count-Sketch, using $O(1/\varepsilon2 \cdot \log n)$ dimensions, guarantees that $|\tilde{x}-x|{\infty} \le \varepsilon |x|_2$ with high probability. A natural question is whether it is possible to design the best of both worlds sketching method, with error guarantees depending on the $\ell_2$ norm and space comparable to Count-Sketch, but (like Count-Min) also has the no-underestimation property. Our main set of results shows that the answer to the above question is negative. We show this in two incomparable computational models: linear sketching and streaming algorithms. We also study the complementary problem, where the sketch is required to not over-estimate, i.e., $\tilde{x} \le x$ should hold always.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.