Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Degree Distribution Identifiability of Stochastic Kronecker Graphs (2310.00171v1)

Published 29 Sep 2023 in cs.DS and cs.SI

Abstract: Large-scale analysis of the distributions of the network graphs observed in naturally-occurring phenomena has revealed that the degrees of such graphs follow a power-law or lognormal distribution. Seshadhri, Pinar, and Kolda (J. ACM, 2013) proved that stochastic Kronecker graph (SKG) models cannot generate graphs with degree distribution that follows a power-law or lognormal distribution. As a result, variants of the SKG model have been proposed to generate graphs which approximately follow degree distributions, without any significant oscillations. However, all existing solutions either require significant additional parameterization or have no provable guarantees on the degree distribution. -- In this work, we present statistical and computational identifiability notions which imply the separation of SKG models. Specifically, we prove that SKG models in different identifiability classes can be separated by the existence of isolated vertices and connected components in their corresponding generated graphs. This could explain the large (i.e., $>50\%$) fraction of isolated vertices in some popular graph generation benchmarks. -- We present and analyze an efficient algorithm that can get rid of oscillations in the degree distribution by mixing seeds of relative prime dimensions. For an initial $2\times 1$ and $2\times 2$ seed, a crucial subroutine of this algorithm solves a degree-2 and degree-4 optimization problem in the variables of the initial seed, respectively. We generalize this approach to solving optimization problems for $m\times n$ seeds, for any $m, n\in\mathbb{N}$. -- The use of $3\times 3$ seeds alone cannot get rid of significant oscillations. We prove that such seeds result in degree distribution that is bounded above by an exponential tail and thus cannot result in a power-law or lognormal.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. S. Arora and B. Barak. Computational Complexity: A Modern Approach. Cambridge University Press, 2006.
  2. Emmanuel Abbe. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res., 18:177:1–177:86, 2017.
  3. Emmanuel Abbe. Community detection and stochastic block models. Found. Trends Commun. Inf. Theory, 14(1-2):1–162, 2018.
  4. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 670–688. IEEE Computer Society, 2015.
  5. Phase transitions for detecting latent geometry in random graphs. Probability Theory and Related Fields, 178:1215 – 1289, 2019.
  6. The method of moments and degree distributions for network models. The Annals of Statistics, 39(5):2280 – 2301, 2011.
  7. Arthur Cayley. A theorem on trees, volume 13 of Cambridge Library Collection - Mathematics, page 26–28. Cambridge University Press, 2009.
  8. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1):2, 2006.
  9. A practical method to reduce privacy loss when disclosing statistics based on small samples. American Economic Review Papers and Proceedings, 109:414–420, 2019.
  10. A model of internet topology using ¡i¿k¡/i¿-shell decomposition. Proceedings of the National Academy of Sciences, 104(27):11150–11154, 2007.
  11. Social capital i: measurement and associations with economic mobility. Nature, 608:1–14, 08 2022.
  12. The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences, 99(25):15879–15882, 2002.
  13. Graph500 Steering Committee. Graph 500 benchmark. http://www.graph500.org/. Accessed: 2016-11-30.
  14. Power-law distributions in empirical data. SIAM Rev., 51(4):661–703, 2009.
  15. R-MAT: A recursive model for graph mining. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004, pages 442–446. SIAM, 2004.
  16. Consistent recovery threshold of hidden nearest neighbor graphs. In Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pages 1540–1553. PMLR, 2020.
  17. Consistent recovery threshold of hidden nearest neighbor graphs. IEEE Trans. Inf. Theory, 67(8):5211–5229, 2021.
  18. P. Erdős and A Rényi. On the evolution of random graphs. In Publication of the Mathematical Institute of the Hungarian Academy of Sciences, pages 17–61, 1960.
  19. E. N. Gilbert. Random graphs. The Annals of Mathematical Statistics, 30(4):1141–1144, 1959.
  20. Probabilistic encryption and how to play mental poker keeping secret all partial information. In Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, STOC ’82, page 365–377, New York, NY, USA, 1982. Association for Computing Machinery.
  21. Probabilistic encryption. Journal of Computer and System Sciences, 28(2):270–299, 1984.
  22. Oded Goldreich. Foundations of Cryptography: Basic Tools. Cambridge University Press, USA, 2000.
  23. Oded Goldreich. Pseudorandomness - part I. In Steven Rudich and Avi Wigderson, editors, Computational Complexity Theory, volume 10 of IAS / Park City mathematics series, pages 253–285. AMS Chelsea Publishing, 2004.
  24. A mathematical analysis of the r-mat random graph generator. Networks, 58(3):159–170, 2011.
  25. R.W. Keener. Theoretical Statistics: Topics for a Core Course. Springer Texts in Statistics. Springer New York, 2010.
  26. Multiplicative attribute graph model of real-world networks. Internet Math., 8(1-2):113–160, 2012.
  27. Properties of stochastic kronecker graphs. Journal of Combinatorics, 6:395–432, 2015.
  28. Structure and evolution of online social networks. In Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos, editors, Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006, pages 611–617. ACM, 2006.
  29. Counting triangles in massive graphs with MapReduce. SIAM Journal on Scientific Computing, 36(5):S44–S77, October 2014.
  30. Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res., 11:985–1042, March 2010.
  31. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In Alípio Jorge, Luís Torgo, Pavel Brazdil, Rui Camacho, and João Gama, editors, Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings, volume 3721 of Lecture Notes in Computer Science, pages 133–145. Springer, 2005.
  32. Daniel Wyatt Margo. Sorting Shapes the Performance of Graph-Structured Systems. PhD thesis, Harvard University, 2017.
  33. Sequential monte carlo for sampling balanced and compact redistricting plans. Annals of Applied Statistics, Forthcoming, 2023.
  34. Michael Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1:226–251, 2003.
  35. Michael Mitzenmacher. Editorial: The future of power law research. Internet Mathematics, 2(4):525–534, 2005.
  36. Tied kronecker product graph models to capture variance in network populations. ACM Trans. Knowl. Discov. Data, 12(3):35:1–35:40, 2018.
  37. Stochastic kronecker graphs. In Algorithms and Models for the Web-Graph, 5th International Workshop, WAW 2007, San Diego, CA, USA, December 11-12, 2007, Proceedings, pages 179–186, 2007.
  38. M. E. J. Newman. Power laws, pareto distributions and zipf’s law. CONTEMPORARY PHYSICS, 2005.
  39. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (S&P 2008), 18-21 May 2008, Oakland, California, USA, pages 111–125. IEEE Computer Society, 2008.
  40. The similarity between stochastic kronecker and chung-lu graph models. In Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, California, USA, April 26-28, 2012, pages 1071–1082. SIAM / Omnipress, 2012.
  41. An in-depth analysis of stochastic kronecker graphs. J. ACM, 60(2):13:1–13:32, May 2013.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Daniel Alabi (14 papers)
  2. Dimitris Kalimeris (4 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.