Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 109 tok/s
Gemini 3.0 Pro 52 tok/s Pro
Gemini 2.5 Flash 159 tok/s Pro
Kimi K2 203 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Count-Min Sketch with Conservative Updates: Worst-Case Analysis (2405.12034v2)

Published 20 May 2024 in cs.DS and cs.PF

Abstract: Count-Min Sketch with Conservative Updates (CMS-CU) is a memory-efficient hash-based data structure used to estimate the occurrences of items within a data stream. CMS-CU stores $m$ counters and employs $d$ hash functions to map items to these counters. We first argue that the estimation error in CMS-CU is maximal when each item appears at most once in the stream. Next, we study CMS-CU in this setting. In the case where $d=m-1$, we prove that the average estimation error and the average counter rate converge almost surely to $\frac{1}{2}$, contrasting with the vanilla Count-Min Sketch, where the average counter rate is equal to $\frac{m-1}{m}$. For any given $m$ and $d$, we prove novel lower and upper bounds on the average estimation error, incorporating a positive integer parameter $g$. Larger values of this parameter improve the accuracy of the bounds. Moreover, the computation of each bound involves examining an ergodic Markov process with a state space of size $\binom{m+g-d}{g}$ and a sparse transition probabilities matrix containing $\mathcal{O}(m\binom{m+g-d}{g})$ non-zero entries. For $d=m-1$, $g=1$, and as $m\to \infty$, we show that the lower and upper bounds coincide. In general, our bounds exhibit high accuracy for small values of $g$, as shown by numerical computation. For example, for $m=50$, $d=4$, and $g=5$, the difference between the lower and upper bounds is smaller than $10{-4}$.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Salsa: Self-adjusting lean streaming analytics. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021. doi:10.1109/ICDE51399.2021.00080.
  2. Analyzing count min sketch with conservative updates. Computer Networks, 217, 2022. doi:10.1016/j.comnet.2022.109315.
  3. Balanced allocations: The heavily loaded case. In Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 745–754, 2000.
  4. Modeling conservative updates in multi-hash approximate count sketches. In 2012 24th International Teletraffic Congress (ITC 24), pages 1–8. IEEE, 2012.
  5. Network Applications of Bloom Filters: A Survey. Internet Mathematics, 1, 2003.
  6. Finding frequent items in data streams. Theoretical Computer Science, 312, 2004. doi:10.1016/S0304-3975(03)00400-6.
  7. Spectral bloom filters. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, 2003. doi:10.1145/872757.872787.
  8. Graham Cormode and S.Ā Muthukrishnan. Summarizing and mining skewed data streams. In Proceedings of the SIAM International Conference on Data Mining (SDM), 2005. doi:10.1137/1.9781611972757.5.
  9. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58–75, 2005.
  10. Graham Cormode and KeĀ Yi. Small Summaries for Big Data. Cambridge University Press, 2020.
  11. A formal analysis of conservative update based approximate counting. In International Conference on Computing, Networking and Communications (ICNC), 2005. doi:10.1109/ICCNC.2015.7069350.
  12. Tinylfu: A highly efficient cache admission policy. ACM Trans. Storage, 13, 2017. doi:10.1145/3149371.
  13. New directions in traffic measurement and accounting. SIGCOMM Comput. Commun. Rev., 2002. doi:10.1145/964725.633056.
  14. William Feller. An Introduction to Probability Theory and its Applications, volumeĀ 1. John Wiley, 3rd edition, 1968.
  15. Count-min sketch with variable number of hash functions: An experimental study. In String Processing and Information Retrieval: 30th International Symposium, SPIRE 2023, Pisa, Italy, September 26–28, 2023, Proceedings, 2023. doi:10.1007/978-3-031-43980-3_17.
  16. Phase transition in count approximation by count-min sketch with conservative updates. In Algorithms and Complexity. CIAC, 2023. doi:10.1007/978-3-031-30448-4_17.
  17. A probabilistic data structures-based anomaly detection scheme for software-defined internet of vehicles. IEEE Transactions on Intelligent Transportation Systems, 22, 2021. doi:10.1109/TITS.2020.2988065.
  18. Sketch algorithms for estimating point queries in nlp. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012.
  19. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley, 2nd edition, 1994. URL: https://www.amazon.com/Concrete-Mathematics-Foundation-Computer-Science/dp/0201558025.
  20. Learning-based frequency estimation algorithms. In International Conference on Learning Representations, 2019.
  21. Netcache: Balancing key-value stores with fast in-network caching. In Proceedings of the 26th Symposium on Operating Systems Principles, 2017. doi:10.1145/3132747.3132764.
  22. Why simple hash functions work: exploiting the entropy in a data stream. In SODA, volumeĀ 8, pages 746–755. Citeseer, 2008.
  23. Jelani Nelson. Sketching and streaming algorithms for processing massive data. XRDS, 19(1):14–19, sep 2012. doi:10.1145/2331042.2331049.
  24. SheldonĀ M. Ross. Introduction to Probability Models. Academic Press, 9th edition, 2007.
  25. Set-min sketch: A probabilistic map for power-law distributions with application to k-mer annotation. Journal of Computational Biology, 29, 2022. doi:10.1089/cmb.2021.0429.
  26. Diamond sketch: Accurate per-flow measurement for big streaming data. IEEE Transactions on Parallel and Distributed Systems, 30, 2019. doi:10.1109/TPDS.2019.2923772.
  27. Heavykeeper: An accurate algorithm for finding top- kš‘˜kitalic_k elephant flows. IEEE/ACM Transactions on Networking, 27, 2019. doi:10.1109/TNET.2019.2933868.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: