Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Coreset Construction for Distributed Machine Learning (1904.05961v3)

Published 11 Apr 2019 in cs.LG, cs.DS, and stat.ML

Abstract: Coreset, which is a summary of the original dataset in the form of a small weighted set in the same sample space, provides a promising approach to enable machine learning over distributed data. Although viewed as a proxy of the original dataset, each coreset is only designed to approximate the cost function of a specific machine learning problem, and thus different coresets are often required to solve different machine learning problems, increasing the communication overhead. We resolve this dilemma by developing robust coreset construction algorithms that can support a variety of machine learning problems. Motivated by empirical evidence that suitably-weighted k-clustering centers provide a robust coreset, we harden the observation by establishing theoretical conditions under which the coreset provides a guaranteed approximation for a broad range of machine learning problems, and developing both centralized and distributed algorithms to generate coresets satisfying the conditions. The robustness of the proposed algorithms is verified through extensive experiments on diverse datasets with respect to both supervised and unsupervised learning problems.

Citations (25)

Summary

We haven't generated a summary for this paper yet.