Emergent Mind

Improved Coresets for Euclidean $k$-Means

(2211.08184)
Published Nov 15, 2022 in cs.CG and cs.LG

Abstract

Given a set of $n$ points in $d$ dimensions, the Euclidean $k$-means problem (resp. the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weighted subset known as a coreset and then run any algorithm on this subset. The guarantee of the coreset is that for any candidate solution, the ratio between coreset cost and the cost of the original instance is less than a $(1\pm \varepsilon)$ factor. The current state of the art coreset size is $\tilde O(\min(k{2} \cdot \varepsilon{-2},k\cdot \varepsilon{-4}))$ for Euclidean $k$-means and $\tilde O(\min(k{2} \cdot \varepsilon{-2},k\cdot \varepsilon{-3}))$ for Euclidean $k$-median. The best known lower bound for both problems is $\Omega(k \varepsilon{-2})$. In this paper, we improve the upper bounds $\tilde O(\min(k{3/2} \cdot \varepsilon{-2},k\cdot \varepsilon{-4}))$ for $k$-means and $\tilde O(\min(k{4/3} \cdot \varepsilon{-2},k\cdot \varepsilon{-3}))$ for $k$-median. In particular, ours is the first provable bound that breaks through the $k2$ barrier while retaining an optimal dependency on $\varepsilon$.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.