Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 119 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 60 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 423 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

$\varepsilon$-Coresets for Clustering (with Outliers) in Doubling Metrics (1804.02530v2)

Published 7 Apr 2018 in cs.DS

Abstract: We study the problem of constructing $\varepsilon$-coresets for the $(k, z)$-clustering problem in a doubling metric $M(X, d)$. An $\varepsilon$-coreset is a weighted subset $S\subseteq X$ with weight function $w : S \rightarrow \mathbb{R}{\geq 0}$, such that for any $k$-subset $C \in [X]k$, it holds that $\sum{x \in S}{w(x) \cdot dz(x, C)} \in (1 \pm \varepsilon) \cdot \sum_{x \in X}{dz(x, C)}$. We present an efficient algorithm that constructs an $\varepsilon$-coreset for the $(k, z)$-clustering problem in $M(X, d)$, where the size of the coreset only depends on the parameters $k, z, \varepsilon$ and the doubling dimension $\mathsf{ddim}(M)$. To the best of our knowledge, this is the first efficient $\varepsilon$-coreset construction of size independent of $|X|$ for general clustering problems in doubling metrics. To this end, we establish the first relation between the doubling dimension of $M(X, d)$ and the shattering dimension (or VC-dimension) of the range space induced by the distance $d$. Such a relation was not known before, since one can easily construct instances in which neither one can be bounded by (some function of) the other. Surprisingly, we show that if we allow a small $(1\pm\epsilon)$-distortion of the distance function $d$, and consider the notion of $\tau$-error probabilistic shattering dimension, we can prove an upper bound of $O( \mathsf{ddim}(M)\cdot \log(1/\varepsilon) +\log\log{\frac{1}{\tau}} )$ for the probabilistic shattering dimension for even weighted doubling metrics. We believe this new relation is of independent interest and may find other applications. We also study the robust coresets and centroid sets in doubling metrics. Our robust coreset construction leads to new results in clustering and property testing, and the centroid sets can be used to accelerate the local search algorithms for clustering problems.

Citations (75)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.