Do you know what q-means? (2308.09701v2)
Abstract: Clustering is one of the most important tools for analysis of large datasets, and perhaps the most popular clustering algorithm is Lloyd's iteration for $k$-means. This iteration takes $n$ vectors $V=[v_1,\dots,v_n]\in\mathbb{R}{n\times d}$ and outputs $k$ centroids $c_1,\dots,c_k\in\mathbb{R}d$; these partition the vectors into clusters based on which centroid is closest to a particular vector. We present an overall improved version of the "$q$-means" algorithm, the quantum algorithm originally proposed by Kerenidis, Landman, Luongo, and Prakash (NeurIPS'19) which performs $\varepsilon$-$k$-means, an approximate version of $k$-means clustering. Our algorithm does not rely on quantum linear algebra primitives of prior work, but instead only uses QRAM to prepare simple states based on the current iteration's clusters and multivariate quantum amplitude estimation. The time complexity is $\widetilde{O}\big(\frac{|V|_F}{\sqrt{n}}\frac{k{5/2}d}{\varepsilon}(\sqrt{k} + \log{n})\big)$ and maintains the logarithmic dependence on $n$ while improving the dependence on most of the other parameters. We also present a "dequantized" algorithm for $\varepsilon$-$k$-means which runs in $O\big(\frac{|V|_F2}{n}\frac{k{2}}{\varepsilon2}(kd + \log{n})\big)$ time. Notably, this classical algorithm matches the logarithmic dependence on $n$ attained by the quantum algorithm.