Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Correlation Clustering Methods for Large Consensus Clustering Instances (2307.03818v1)

Published 7 Jul 2023 in cs.DS

Abstract: Consensus clustering (or clustering aggregation) inputs $k$ partitions of a given ground set $V$, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either $k$ or $V$ gets large. In this paper we provide practical run time improvements for correlation clustering solvers when $V$ is large. We reduce the time complexity of Pivot from $O(|V|2 k)$ to $O(|V| k)$, and its space complexity from $O(|V|2)$ to $O(|V| k)$ -- a significant savings since in practice $k$ is much less than $|V|$. We also analyze a sampling method for these algorithms when $k$ is large, bridging the gap between running Pivot on the full set of input partitions (an expected 1.57-approximation) and choosing a single input partition at random (an expected 2-approximation). We show experimentally that algorithms like Pivot do obtain quality clustering results in practice even on small samples of input partitions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Fair correlation clustering. In International Conference on Artificial Intelligence and Statistics, pages 4195–4205. PMLR, 2020.
  2. N. Ailon and E. Liberty. Correlation clustering revisited: The “true” cost of error minimization problems. In International Colloquium on Automata, Languages, and Programming, pages 24–36. Springer, 2009.
  3. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):1–27, 2008.
  4. Correlation clustering. Machine learning, 56(1-3):89–113, 2004.
  5. Local correlation clustering. arXiv preprint arXiv:1312.5105, 2013.
  6. Correlation clustering: from theory to practice. In KDD, page 1972, 2014.
  7. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005.
  8. Correlation clustering in mapreduce. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 641–650, 2014.
  9. Correlation clustering with sherali-adams. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 651–661. IEEE, 2022.
  10. Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2-3):172–187, 2006.
  11. M. Elsner and W. Schudy. Bounding and comparing methods for correlation clustering beyond ilp. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing, pages 19–27, 2009.
  12. Query-efficient correlation clustering. In Proceedings of The Web Conference 2020, pages 1468–1478, 2020.
  13. Clustering aggregation. Acm transactions on knowledge discovery from data (tkdd), 1(1):4–es, 2007.
  14. A. Goder and V. Filkov. Consensus clustering algorithms: Comparison and refinement. In 2008 Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 109–117. SIAM, 2008.
  15. A hybrid data deduplication approach in entity resolution using chromatic correlation clustering. In International Conference on Frontiers in Cyber Security, pages 153–167. Springer, 2018.
  16. A color-blind 3-approximation for chromatic correlation clustering and improved heuristics. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 882–891, 2021.
  17. Clustering large probabilistic graphs. IEEE Transactions on Knowledge and Data Engineering, 25(2):325–336, 2011.
  18. Robust online correlation clustering. Advances in Neural Information Processing Systems, 34, 2021.
  19. On the generation of correlated artificial binary data. Working Paper Series, SFB “Adaptive Information Systems and Modelling in Economics and Management Science”, 1998.
  20. bindata: Generation of Artificial Binary Data, 2021. URL https://CRAN.R-project.org/package=bindata. R package version 0.9-20.
  21. In and out: Optimizing overall interaction in probabilistic graphs under clustering constraints. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1371–1381, 2020.
  22. Parallel correlation clustering on big graphs. In Advances in Neural Information Processing Systems, pages 82–90, 2015.
  23. G. J. Puleo and O. Milenkovic. Correlation clustering with constrained cluster sizes and extended weights bounds. SIAM Journal on Optimization, 25(3):1857–1872, 2015.
  24. Scalable community detection via parallel correlation clustering. Proceedings of the VLDB Endowment, 14(11):2305–2313, 2021.
  25. A. Van Zuylen and D. P. Williamson. Deterministic pivoting algorithms for constrained ranking and clustering problems. Mathematics of Operations Research, 34(3):594–620, 2009.
  26. S. Vega-Pons and J. Ruiz-Shulcloper. A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03):337–372, 2011.
  27. A correlation clustering framework for community detection. In Proceedings of the 2018 World Wide Web Conference, pages 439–448, 2018.

Summary

We haven't generated a summary for this paper yet.