Emergent Mind

Distributed Data Summarization in Well-Connected Networks

(1908.00236)
Published Aug 1, 2019 in cs.DS and cs.DC

Abstract

We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph $G$ of $n$ nodes each of which may hold a value initially, we focus on computing $\sum{i=1}N g(fi)$, where $fi$ is the number of occurrences of value $i$ and $g$ is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data. In the CONGEST model, a simple adaptation from streaming lower bounds shows that it requires $\tilde{\Omega}(D+ n)$ rounds, where $D$ is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes $\sum{i=1}{N} g(fi)$ exactly in $\tauG \cdot 2{O(\sqrt{\log n})}$ rounds where $\tauG$ is the mixing time of $G$. This also has applications in computing the top $k$ most frequent elements. We demonstrate that there is a high similarity between the GOSSIP model and the CONGEST model in well-connected graphs. In particular, we show that each round of the GOSSIP model can be simulated almost-perfectly in $\tilde{O}(\tauG $ rounds of the CONGEST model. To this end, we develop a new algorithm for the GOSSIP model that $1\pm \epsilon$ approximates the $p$-th frequency moment $Fp = \sum{i=1}N fip$ in $\tilde{O}(\epsilon{-2} n{1-k/p})$ rounds, for $p \geq2$, when the number of distinct elements $F0$ is at most $O\left(n{1/(k-1)}\right)$. This result can be translated back to the CONGEST model with a factor $\tilde{O}(\tau_G)$ blow-up in the number of rounds.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.