On Approximating the Number of $k$-cliques in Sublinear Time (1707.04858v2)

Published 16 Jul 2017 in cs.DS

Abstract: We study the problem of approximating the number of $k$-cliques in a graph when given query access to the graph. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let $n$ denote the number of vertices in the graph, $m$ the number of edges, and $C_k$ the number of $k$-cliques. We design an algorithm that outputs a $(1+\varepsilon)$-approximation (with high probability) for $C_k$, whose expected query complexity and running time are $O\left(\frac{n}{C_k^{{1/k}}+\frac{m^{{k/2}}{C_k}\right)\poly(\log}} n,1/\varepsilon,k)$. Hence, the complexity of the algorithm is sublinear in the size of the graph for $C_k = \omega(m^{k/2-1})$. Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on $\log n$, $1/\varepsilon$ and $k$). The previous results in this vein are by Feige (SICOMP 06) and by Goldreich and Ron (RSA 08) for edge counting ($k=2$) and by Eden et al. (FOCS 2015) for triangle counting ($k=3$). Our result matches the complexities of these results. The previous result by Eden et al. hinges on a certain amortization technique that works only for triangle counting, and does not generalize for larger cliques. We obtain a general algorithm that works for any $k\geq 3$ by designing a procedure that samples each $k$-clique incident to a given set $S$ of vertices with approximately equal probability. The primary difficulty is in finding cliques incident to purely high-degree vertices, since random sampling within neighbors has a low success probability. This is achieved by an algorithm that samples uniform random high degree vertices and a careful tradeoff between estimating cliques incident purely to high-degree vertices and those that include a low-degree vertex.

Citations (77)

View on Semantic Scholar