On Approximating the Number of $k$-cliques in Sublinear Time (1707.04858v2)
Abstract: We study the problem of approximating the number of $k$-cliques in a graph when given query access to the graph. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let $n$ denote the number of vertices in the graph, $m$ the number of edges, and $C_k$ the number of $k$-cliques. We design an algorithm that outputs a $(1+\varepsilon)$-approximation (with high probability) for $C_k$, whose expected query complexity and running time are $O\left(\frac{n}{C_k{1/k}}+\frac{m{k/2}}{C_k}\right)\poly(\log n,1/\varepsilon,k)$. Hence, the complexity of the algorithm is sublinear in the size of the graph for $C_k = \omega(m{k/2-1})$. Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on $\log n$, $1/\varepsilon$ and $k$). The previous results in this vein are by Feige (SICOMP 06) and by Goldreich and Ron (RSA 08) for edge counting ($k=2$) and by Eden et al. (FOCS 2015) for triangle counting ($k=3$). Our result matches the complexities of these results. The previous result by Eden et al. hinges on a certain amortization technique that works only for triangle counting, and does not generalize for larger cliques. We obtain a general algorithm that works for any $k\geq 3$ by designing a procedure that samples each $k$-clique incident to a given set $S$ of vertices with approximately equal probability. The primary difficulty is in finding cliques incident to purely high-degree vertices, since random sampling within neighbors has a low success probability. This is achieved by an algorithm that samples uniform random high degree vertices and a careful tradeoff between estimating cliques incident purely to high-degree vertices and those that include a low-degree vertex.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.