Statistically Optimal K-means Clustering via Nonnegative Low-rank Semidefinite Programming (2305.18436v5)
Abstract: $K$-means clustering is a widely used machine learning method for identifying patterns in large datasets. Recently, semidefinite programming (SDP) relaxations have been proposed for solving the $K$-means optimization problem, which enjoy strong statistical optimality guarantees. However, the prohibitive cost of implementing an SDP solver renders these guarantees inaccessible to practical datasets. In contrast, nonnegative matrix factorization (NMF) is a simple clustering algorithm widely used by machine learning practitioners, but it lacks a solid statistical underpinning and theoretical guarantees. In this paper, we consider an NMF-like algorithm that solves a nonnegative low-rank restriction of the SDP-relaxed $K$-means formulation using a nonconvex Burer--Monteiro factorization approach. The resulting algorithm is as simple and scalable as state-of-the-art NMF algorithms while also enjoying the same strong statistical optimality guarantees as the SDP. In our experiments, we observe that our algorithm achieves significantly smaller mis-clustering errors compared to the existing state-of-the-art while maintaining scalability.
- Np-hardness of euclidean sum-of-squares clustering. Machine learning, 75(2):245–248, 2009.
- K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035, 2007.
- Dimitri P Bertsekas. Multiplier methods: A survey. Automatica, 12(2):133–145, 1976.
- Dimitri P Bertsekas. Constrained optimization and Lagrange multiplier methods. Academic press, 2014.
- Samuel Burer and Renato D. C. Monteiro. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95(2):329–357, 2003. doi: 10.1007/s10107-002-0352-8. URL https://doi.org/10.1007/s10107-002-0352-8.
- Cutoff for exact recovery of gaussian mixture models. IEEE Transactions on Information Theory, 67(6):4223–4238, 2021.
- Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019.
- Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations. IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 92(3):708–721, January 2009. doi: 10.1587/transfun.E92.A.708.
- Sanjoy Dasgupta. The hardness of k𝑘kitalic_k-means clustering. Technical Report CS2007-0890, University of California, San Diego, 2007.
- On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the 2005 SIAM international conference on data mining, pages 606–610. SIAM, 2005.
- UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Hidden integrality of sdp relaxations for sub-gaussian mixture models. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 1931–1965. PMLR, 06–09 Jul 2018. URL https://proceedings.mlr.press/v75/fei18a.html.
- Local convergence of exact and inexact augmented lagrangian methods under the second-order sufficient optimality condition. SIAM Journal on Optimization, 22(2):384–407, 2012.
- No spurious local minima in nonconvex low rank problems: A unified geometric analysis. In International Conference on Machine Learning, pages 1233–1242. PMLR, 2017.
- Partial recovery bounds for clustering with the relaxed k𝑘kitalic_kmeans. Math. Stat. Learn., (3/4):317–374, 2018.
- Eigen selection in spectral clustering: A theory-guided practice. Journal of the American Statistical Association, 118(541):109–121, 2023. doi: 10.1080/01621459.2021.1917418. URL https://doi.org/10.1080/01621459.2021.1917418.
- Symmetric nonnegative matrix factorization: Algorithms and applications to probabilistic clustering. IEEE Transactions on Neural Networks, 22(12):2117–2131, 2011. doi: 10.1109/TNN.2011.2172457.
- Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. Journal of Global Optimization, 58(2):285–319, 2014. doi: 10.1007/s10898-013-0035-4. URL https://doi.org/10.1007/s10898-013-0035-4.
- Symnmf: nonnegative low-rank approximation of a similarity matrix for graph clustering. Journal of Global Optimization, 62:545–574, 2015.
- Fast low-rank semidefinite programming for embedding and clustering. In Marina Meila and Xiaotong Shen, editors, Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, volume 2 of Proceedings of Machine Learning Research, pages 235–242, San Juan, Puerto Rico, 21–24 Mar 2007. PMLR. URL https://proceedings.mlr.press/v2/kulis07a.html.
- Stuart Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129–137, 1982.
- Optimality of spectral clustering in the Gaussian mixture model. The Annals of Statistics, 49(5):2506 – 2530, 2021. doi: 10.1214/20-AOS2044. URL https://doi.org/10.1214/20-AOS2044.
- Yu Lu and Harrison H Zhou. Statistical and computational guarantees of lloyd’s algorithm and its variants. arXiv preprint arXiv:1612.02099, 2016.
- J.B. MacQueen. Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Sympos. Math. Statist. and Probability, pages 281–297, 1967.
- Clustering subgaussian mixtures by semidefinite programming. Information and Inference: A Journal of the IMA, 6(4):389–415, 03 2017. ISSN 2049-8764. doi: 10.1093/imaiai/iax001. URL https://doi.org/10.1093/imaiai/iax001.
- On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, pages 849–856. MIT Press, 2001.
- Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer series in operations research and financial engineering. Springer, New York, NY, 2. ed. edition, 2006. ISBN 978-0-387-30303-1. URL http://gso.gbv.de/DB=2.1/CMD?ACT=SRCHA&SRT=YOP&IKT=1016&TRM=ppn+502988711&sourceid=fbw_bibsonomy.
- Jiming Peng and Yu Wei. Approximating k𝑘kitalic_k-means-type clustering via semidefinite programming. SIAM J. OPTIM, 18(1):186–205, 2007.
- Martin Royer. Adaptive clustering through semidefinite programming. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 1795–1803. Curran Associates, Inc., 2017.
- Ulrike von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
- Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on knowledge and data engineering, 25(6):1336–1353, 2012.
- Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR ’03, page 267–273, New York, NY, USA, 2003. Association for Computing Machinery. ISBN 1581136463. doi: 10.1145/860435.860485. URL https://doi.org/10.1145/860435.860485.
- Sdpnal+: a majorized semismooth newton-cg augmented lagrangian method for semidefinite programming with nonnegative constraints. Mathematical Programming Computation, 7(3):331–366, 2015. doi: 10.1007/s12532-015-0082-6. URL https://doi.org/10.1007/s12532-015-0082-6.
- Dropping symmetry for fast symmetric nonnegative matrix factorization. Advances in Neural Information Processing Systems, 31, 2018.
- Wasserstein k𝑘kitalic_k-means for clustering probability distributions. In Advances in Neural Information Processing Systems, 2022.
- Likelihood adjusted semidefinite programs for clustering heterogeneous data. In International Conference on Machine Learning (ICML), 2023.