Emergent Mind

Streaming PTAS for Constrained k-Means

(1909.07511)
Published Sep 16, 2019 in cs.DS

Abstract

We generalise the results of Bhattacharya et al. (Journal of Computing Systems, 62(1):93-115, 2018) for the list-$k$-means problem defined as -- for a (unknown) partition $X1, ..., Xk$ of the dataset $X \subseteq \mathbb{R}d$, find a list of $k$-center sets (each element in the list is a set of $k$ centers) such that at least one of $k$-center sets ${c1, ..., ck}$ in the list gives an $(1+\varepsilon)$-approximation with respect to the cost function $\min{\textrm{permutation } \pi} \left[ \sum{i=1}{k} \sum{x \in Xi} ||x - c{\pi(i)}||2 \right]$. The list-$k$-means problem is important for the constrained $k$-means problem since algorithms for the former can be converted to PTAS for various versions of the latter. Following are the consequences of our generalisations: - Streaming algorithm: Our $D2$-sampling based algorithm running in a single iteration allows us to design a 2-pass, logspace streaming algorithm for the list-$k$-means problem. This can be converted to a 4-pass, logspace streaming PTAS for various constrained versions of the $k$-means problem. - Faster PTAS under stability: Our generalisation is also useful in $k$-means clustering scenarios where finding good centers becomes easy once good centers for a few "bad" clusters have been chosen. One such scenario is clustering under stability where the number of such bad clusters is a constant. Using the above idea, we significantly improve the running time of the known algorithm from $O(dn3) (k \log{n}){poly(\frac{1}{\beta}, \frac{1}{\varepsilon})}$ to $O \left(dn3 k{\tilde{O}{\beta \varepsilon}(\frac{1}{\beta \varepsilon})} \right)$.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.