Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 47 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 104 tok/s Pro
Kimi K2 156 tok/s Pro
GPT OSS 120B 474 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Moderate Dimension Reduction for $k$-Center Clustering (2312.01391v5)

Published 3 Dec 2023 in cs.DS

Abstract: The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}d$ and any fixed $\epsilon>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+\epsilon$. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or $k$-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below $O(\log n)$ does not preserve the optimal value within factor $1+\epsilon$. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor $\alpha>1$ using target dimension $\tfrac{\log n}{poly(\alpha)}$. We establish the viability of this approach and show that the famous $k$-center problem is $\alpha$-approximated when reducing to dimension $O(\tfrac{\log n}{\alpha2}+\log k)$. Along the way, we address the diameter problem via the special case $k=1$. Our result extends to several important variants of $k$-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our $poly(\alpha)$-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for $k$-center in dynamic geometric streams, that achieves $O(\alpha)$-approximation using space $poly(kdn{1/\alpha2})$. This is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$. Furthermore, it extends to the $k$-center variants mentioned above.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In Proceedings of the 38th Annual ACM Symposium on Theory of Computing, STOC, pages 557–563, 2006. doi:10.1145/1132516.1132597.
  2. Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4):671–687, 2003. doi:10.1016/S0022-0000(03)00025-4.
  3. On the fine-grained complexity of approximating k-center in sparse graphs. In Symposium on Simplicity in Algorithms, SOSA, pages 145–155. SIAM, 2023. doi:10.1137/1.9781611977585.ch14.
  4. Efficient sketches for earth-mover distance, with applications. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, pages 324–330. IEEE Computer Society, 2009. doi:10.1109/FOCS.2009.25.
  5. Randomized embeddings with slack and high-dimensional approximate nearest neighbor. ACM Trans. Algorithms, 14(2):18:1–18:21, 2018. doi:10.1145/3178540.
  6. Earth mover distance over high-dimensional spaces. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 343–352, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347120.
  7. Exact and approximation algorithms for clustering. Algorithmica, 33(2):201–226, 2002. doi:10.1007/s00453-001-0110-y.
  8. Pankaj K. Agarwal and R. Sharathkumar. Streaming algorithms for extent problems in high dimensions. Algorithmica, 72(1):83–98, 2015. doi:10.1007/s00453-013-9846-4.
  9. Oblivious dimension reduction for k𝑘kitalic_k-means: beyond subspaces and the Johnson-Lindenstrauss Lemma. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages 1039–1050, 2019. doi:10.1145/3313276.3316318.
  10. The power of uniform sampling for coresets. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 462–473, 2022. doi:10.1109/FOCS54457.2022.00051.
  11. Clustering high dimensional dynamic data streams. In Proceedings of the 34th International Conference on Machine Learning, ICML, volume 70 of Proceedings of Machine Learning Research, pages 576–585. PMLR, 2017. URL: http://proceedings.mlr.press/v70/braverman17a.html.
  12. On coresets for fair clustering in metric and euclidean spaces and their applications. In 48th International Colloquium on Automata, Languages, and Programming, ICALP, volume 198 of LIPIcs, pages 23:1–23:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.23, doi:10.4230/LIPICS.ICALP.2021.23.
  13. How to allocate network centers. J. Algorithms, 15(3):385–415, 1993. doi:10.1006/jagm.1993.1047.
  14. Random projections for k𝑘kitalic_k-means clustering. In 24th Annual Conference on Neural Information Processing Systems, NeurIPS, pages 298–306. Curran Associates, Inc., 2010. URL: https://proceedings.neurips.cc/paper/2010/hash/73278a4a86960eeb576a8fd4c9ec6997-Abstract.html.
  15. Incremental clustering and dynamic information retrieval. SIAM J. Comput., 33(6):1417–1440, 2004. doi:10.1137/S0097539702418498.
  16. Streaming Euclidean MST to a Constant Factor. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC, pages 156–169, 2023. doi:10.1145/3564246.3585168.
  17. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC, pages 163–172, 2015. doi:10.1145/2746539.2746569.
  18. Streaming facility location in high dimension via geometric hashing. CoRR, 2022. The latest version has additional results compared to the preliminary version in [CJK+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT22]. arXiv:2204.02095.
  19. Streaming facility location in high dimension via geometric hashing. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 450–461, 2022. doi:10.1109/FOCS54457.2022.00050.
  20. Streaming Euclidean Max-Cut: Dimension vs data reduction. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC, pages 170–182, 2023. doi:10.1145/3564246.3585170.
  21. New streaming algorithms for high dimensional EMD and MST. In 54th Annual Symposium on Theory of Computing, STOC, pages 222–233. ACM, 2022. doi:10.1145/3519935.3519979.
  22. Fair clustering through fairlets. In Annual Conference on Neural Information Processing Systems, NeurIPS, pages 5029–5037, 2017. URL: https://proceedings.neurips.cc/paper/2017/hash/978fce5bcc4eccc88ad48ce3914124a2-Abstract.html.
  23. Kenneth L. Clarkson. Nearest neighbor queries in metric spaces. Discret. Comput. Geom., 22(1):63–93, 1999. doi:10.1007/PL00009449.
  24. Solving k𝑘kitalic_k-center clustering (with outliers) in MapReduce and streaming, almost as accurately as sequentially. Proc. VLDB Endow., 12(7):766–778, 2019. doi:10.14778/3317315.3317319.
  25. Diameter and k𝑘kitalic_k-Center in Sliding Windows. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), volume 55 of Leibniz International Proceedings in Informatics (LIPIcs), pages 19:1–19:12. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2016. doi:10.4230/LIPIcs.ICALP.2016.19.
  26. The Johnson-Lindenstrauss Lemma for clustering and subspace approximation: From coresets to dimension reduction. CoRR, 2022. arXiv:2205.00371.
  27. k𝑘kitalic_k-center clustering with outliers in the MPC and streaming model. In IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023, pages 853–863. IEEE, 2023. doi:10.1109/IPDPS54959.2023.00090.
  28. k𝑘kitalic_k-center clustering with outliers in the sliding-window model. In 29th Annual European Symposium on Algorithms, ESA, volume 204 of Leibniz International Proceedings in Informatics (LIPIcs), pages 13:1–13:13. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.ESA.2021.13.
  29. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms, 22(1):60–65, 2003. doi:10.1002/rsa.10073.
  30. A sparse Johnson–Lindenstrauss transform. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC, pages 341–350, 2010. doi:10.1145/1806689.1806737.
  31. Some geometric applications of the beta distribution. Ann. Inst. Stat. Math., 42(3):463–474, 1990. doi:10.1007/BF00049302.
  32. Bounded geometries, fractals, and low-distortion embeddings. In 44th Symposium on Foundations of Computer Science, FOCS, pages 534–543. IEEE Computer Society, 2003. doi:10.1109/SFCS.2003.1238226.
  33. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293–306, 1985. doi:10.1016/0304-3975(85)90224-5.
  34. Coresets for clustering with fairness constraints. In Advances in Neural Information Processing Systems 32, NeurIPS, pages 7587–7598, 2019. URL: https://proceedings.neurips.cc/paper/2019/hash/810dfbbebb17302018ae903e9cb7a483-Abstract.html.
  35. Nearly optimal dynamic k𝑘kitalic_k-means clustering for high-dimensional data. 2018. arXiv:1802.00459.
  36. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, STOC, pages 604–613, 1998. doi:10.1145/276698.276876.
  37. Nearest-neighbor-preserving embeddings. ACM Trans. Algorithms, 3(3):31, 2007. doi:10.1145/1273340.1273347.
  38. Piotr Indyk. Better algorithms for high-dimensional proximity problems via asymmetric embeddings. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 539–545, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644200.
  39. Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing, STOC, page 373–380, 2004. doi:10.1145/1007352.1007413.
  40. Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26:189–206, 01 1984. doi:10.1090/conm/026/737400.
  41. An improved data stream algorithm for clustering. Comput. Geom., 48(9):635–645, 2015. doi:10.1016/j.comgeo.2015.06.003.
  42. Empirical processes and random projections. Journal of Functional Analysis, 225(1):229–245, 2005. doi:10.1016/j.jfa.2004.10.009.
  43. The capacitated K-center problem. SIAM J. Discret. Math., 13(3):403–418, 2000. doi:10.1137/S0895480197329776.
  44. Christiane Lammersen. Approximation Techniques for Facility Location and Their Applications in Metric Embeddings. PhD thesis, Dortmund, Technische Universität, 2010.
  45. B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5):1302 – 1338, 2000. doi:10.1214/aos/1015957395.
  46. Optimality of the Johnson-Lindenstrauss lemma. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 633–638, 2017. doi:10.1109/FOCS.2017.64.
  47. Streaming embeddings with slack. In 11th International Symposium on Algorithms and Data Structures, WADS, volume 5664 of Lecture Notes in Computer Science, pages 483–494. Springer, 2009. doi:10.1007/978-3-642-03367-4_42.
  48. Jirí Matousek. On variants of the Johnson-Lindenstrauss Lemma. Random Struct. Algorithms, 33(2):142–156, 2008. doi:10.1002/rsa.20218.
  49. Streaming algorithms for k𝑘kitalic_k-center clustering with outliers and with anonymity. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, volume 5171 of Lecture Notes in Computer Science, pages 165–178. Springer, 2008. doi:10.1007/978-3-540-85363-3_14.
  50. Performance of Johnson-Lindenstrauss transform for k𝑘kitalic_k-means and k𝑘kitalic_k-medians clustering. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages 1027–1038, 2019. doi:10.1145/3313276.3316350.
  51. Jelani Nelson. Dimensionality reduction in Euclidean space. Notices of the American Mathematical Society, 67(10):1498–1507, 2020. doi:10.1090/noti2166.
  52. Randomized dimensionality reduction for facility location and single-linkage clustering. In Proceedings of the 38th International Conference on Machine Learning, ICML, volume 139 of Proceedings of Machine Learning Research, pages 7948–7957. PMLR, 2021. URL: http://proceedings.mlr.press/v139/narayanan21b.html.
  53. C. A. Rogers. Covering a sphere with spheres. Mathematika, 10(2):157–164, 1963. doi:10.1112/S0025579300004083.
  54. Fair coresets and streaming algorithms for fair k𝑘kitalic_k-means. In Approximation and Online Algorithms - 17th International Workshop, WAOA, volume 11926 of Lecture Notes in Computer Science, pages 232–251. Springer, 2019. doi:10.1007/978-3-030-39479-0_16.
  55. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. doi:10.1017/9781108231596.
  56. High-dimensional geometric streaming in polynomial space. In 63rd Annual Symposium on Foundations of Computer Science, FOCS, pages 732–743. IEEE, 2022. doi:10.1109/FOCS54457.2022.00075.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com