Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions

Published 9 Nov 2018 in cs.DS and stat.ME | (1811.04150v1)

Abstract: The Count-Min sketch is an important and well-studied data summarization method. It allows one to estimate the count of any item in a stream using a small, fixed size data sketch. However, the accuracy of the sketch depends on characteristics of the underlying data. This has led to a number of count estimation procedures which work well in one scenario but perform poorly in others. A practitioner is faced with two basic, unanswered questions. Which variant should be chosen when the data is unknown? Given an estimate, is its error sufficiently small to be trustworthy? We provide answers to these questions. We derive new count estimators, including a provably optimal estimator, which best or match previous estimators in all scenarios. We also provide practical, tight error bounds at query time for both new and existing estimators. These error estimates also yield procedures to choose the sketch tuning parameters optimally, as they can extrapolate the error to different choices of sketch width and depth. The key observation is that the distribution of errors in each counter can be empirically estimated from the sketch itself. By first estimating this distribution, count estimation becomes a statistical estimation and inference problem with a known error distribution. This provides both a principled way to derive new and optimal estimators as well as a way to study the error and properties of existing estimators.

Abstract PDF Upgrade to Chat

Citations (33)

View on Semantic Scholar

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (1)

Daniel Ting

Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (1)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (1)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research