Emergent Mind

Massively Parallel Algorithms for the Stochastic Block Model

(2307.00530)
Published Jul 2, 2023 in cs.DS and cs.DC

Abstract

Learning the community structure of a large-scale graph is a fundamental problem in machine learning, computer science and statistics. We study the problem of exactly recovering the communities in a graph generated from the Stochastic Block Model (SBM) in the Massively Parallel Computation (MPC) model. Specifically, given $kn$ vertices that are partitioned into $k$ equal-sized clusters (i.e., each has size $n$), a graph on these $kn$ vertices is randomly generated such that each pair of vertices is connected with probability~$p$ if they are in the same cluster and with probability $q$ if not, where $p > q > 0$. We give MPC algorithms for the SBM in the (very general) \emph{$s$-space MPC model}, where each machine has memory $s=\Omega(\log n)$. Under the condition that $\frac{p-q}{\sqrt{p}}\geq \tilde{\Omega}(k{\frac12}n{-\frac12+\frac{1}{2(r-1)}})$ for any integer $r\in [3,O(\log n)]$, our first algorithm exactly recovers all the $k$ clusters in $O(kr\logs n)$ rounds using $\tilde{O}(m)$ total space, or in $O(r\logs n)$ rounds using $\tilde{O}(km)$ total space. If $\frac{p-q}{\sqrt{p}}\geq \tilde{\Omega}(k{\frac34}n{-\frac14})$, our second algorithm achieves $O(\log_s n)$ rounds and $\tilde{O}(m)$ total space complexity. Both algorithms significantly improve upon a recent result of Cohen-Addad et al. [PODC'22], who gave algorithms that only work in the \emph{sublinear space MPC model}, where each machine has local memory~$s=O(n{\delta})$ for some constant $\delta>0$, with a much stronger condition on $p,q,k$. Our algorithms are based on collecting the $r$-step neighborhood of each vertex and comparing the difference of some statistical information generated from the local neighborhoods for each pair of vertices. To implement the clustering algorithms in parallel, we present efficient approaches for implementing some basic graph operations in the $s$-space MPC model.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.