Implicit degree bias in the link prediction task (2405.14985v2)
Abstract: Link prediction -- a task of distinguishing actual hidden edges from random unconnected node pairs -- is one of the quintessential tasks in graph machine learning. Despite being widely accepted as a universal benchmark and a downstream task for representation learning, the validity of the link prediction benchmark itself has been rarely questioned. Here, we show that the common edge sampling procedure in the link prediction task has an implicit bias toward high-degree nodes and produces a highly skewed evaluation that favors methods overly dependent on node degree, to the extent that a ``null'' link prediction method based solely on node degree can yield nearly optimal performance. We propose a degree-corrected link prediction task that offers a more reasonable assessment that aligns better with the performance in the recommendation task. Finally, we demonstrate that the degree-corrected benchmark can more effectively train graph machine-learning models by reducing overfitting to node degrees and facilitating the learning of relevant structures in graphs.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE, 2009.
- Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 2012.
- SQuAD: 100,000+ questions for machine comprehension of text. In Jian Su, Kevin Duh, and Xavier Carreras, editors, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas, November 2016. Association for Computational Linguistics.
- Know what you don’t know: Unanswerable questions for SQuAD. In Iryna Gurevych and Yusuke Miyao, editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia, July 2018. Association for Computational Linguistics.
- Learning spectral graph transformations for link prediction. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 561–568, 2009.
- Link prediction in social networks: the state-of-the-art. Science China Information Sciences, pages 1–38, 2014.
- Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 141–142, 2005.
- Link prediction via matrix factorization. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part II 22, pages 437–452. Springer, 2011.
- Rotate: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations, 2019.
- Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing Systems, 26, 2013.
- Application of network link prediction in drug discovery. BMC Bioinformatics, 22:1–21, 2021.
- Openbiolink: a benchmarking framework for large-scale biomedical link prediction. Bioinformatics, 36(13):4097–4098, 2020.
- Predicting drug-target interaction network using deep learning model. Computational biology and chemistry, 80:90–101, 2019.
- Overactive bladder successfully treated with duloxetine in a female adolescent. Clinical Psychopharmacology and Neuroscience, 13(2):212, 2015.
- Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches. BMC Bioinformatics, 19:1–11, 2018.
- Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics, 36(4):1241–1251, 2020.
- Biokeen: a library for learning and evaluating biological knowledge graph embeddings. Bioinformatics, 35(18):3538–3540, 2019.
- Stacking models for nearly optimal link prediction in complex networks. Proceedings of the National Academy of Sciences, 117(38):23393–23400, 2020.
- The link prediction problem for social networks. In Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM ’03, page 556–559, New York, NY, USA, 2003. Association for Computing Machinery.
- Benchmarking network embedding models for link prediction: Are we making progress? In 2020 IEEE 7th International conference on data science and advanced analytics (DSAA), pages 138–147. IEEE, 2020.
- Link prediction by de-anonymization: How we won the kaggle social network challenge. In The 2011 International Joint Conference on Neural Networks, pages 1825–1834. IEEE, 2011.
- Mark EJ Newman. Network structure from rich but noisy data. Nature Physics, 14(6):542–545, 2018.
- Network Science. Cambridge University Press, Cambridge, United Kingdom, 1st edition edition, 2016.
- Evaluating graph neural networks for link prediction: Current pitfalls and new benchmarking. Advances in Neural Information Processing Systems, 36, 2024.
- Nicolas Menand and C Seshadhri. Link prediction using low-dimensional node embeddings: The measurement problem. Proceedings of the National Academy of Sciences, 121(8):e2312527121, 2024.
- Evaluating link prediction methods. Knowledge and Information Systems, 45:751–782, 2015.
- Link prediction without graph neural networks. arXiv preprint arXiv:2305.13656, 2023.
- Pairwise learning for neural link prediction. arXiv preprint arXiv:2112.02936, 2021.
- Link prediction based on graph neural networks. Advances in Neural Information Processing Systems, 31, 2018.
- Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on KDD, KDD ’16, pages 855–864, New York, NY, USA, 2016. Association for Computing Machinery.
- Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge discovery and data mining, pages 1105–1114, 2016.
- Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151:78–94, 2018.
- Line graph neural networks for link prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5103–5113, 2021.
- Scott L Feld. Why your friends have more friends than you do. American journal of sociology, 96(6):1464–1477, 1991.
- Velocity and hierarchical spread of epidemic outbreaks in scale-free networks. Physical Review letters, 92(17):178701, 2004.
- Social network sensors for early detection of contagious outbreaks. PloS ONE, 5(9):e12948, 2010.
- The effectiveness of backward contact tracing in networks. Nature Physics, 17(5):652–658, 2021.
- Residual2vec: Debiasing graph embedding with random graphs. Advances in Neural Information Processing Systems, 34:24150–24163, 2021.
- Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 15(6):1373–1396, 2003.
- Santo Fortunato. Community detection in graphs. Physics Reports, 486(3):75–174, 2010.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
- Link prediction: fair and effective evaluation. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 376–383. IEEE, 2012.
- Revisiting link prediction: A data perspective. arXiv preprint arXiv:2310.00793, 2023.
- Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002.
- Scale-free networks. Scientific american, 288(5):60–69, 2003.
- Petter Holme. Rare and everywhere: Perspectives on scale-free networks. Nature communications, 10(1):1016, 2019.
- Scale-free networks well done. Physical Review Research, 1(3):033034, 2019.
- How rare are power-law networks really? Proceedings of the Royal Society A, 476(2241):20190742, 2020.
- Scale-free networks are rare. Nature communications, 10(1):1017, 2019.
- Power-law distributions in empirical data. SIAM review, 51(4):661–703, 2009.
- Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences, 105(45):17268–17272, 2008.
- Continuous univariate distributions, volume 2, volume 289. John wiley & sons, 1995.
- Sidney Redner. Citation statistics from 110 years of physical review. Physics today, 58(6):49–54, 2005.
- David J Hand. Measuring classifier performance: a coherent alternative to the area under the roc curve. Machine learning, 77(1):103–123, 2009.
- Christopher M Bishop. Pattern recognition and machine learning. Springer google schola, 2:1122–1128, 2006.
- A similarity measure for indefinite rankings. ACM Trans. Inf. Syst., 28(4), nov 2010.
- Inductive representation learning on large graphs. Advances in Neural Information Processing Systems, 30, 2017.
- Mean-field theory of graph neural networks in graph partitioning. Advances in Neural Information Processing Systems, 31, 2018.
- LINE: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15, pages 1067–1077, Republic and Canton of Geneva, CHE, 2015. International World Wide Web Conferences Steering Committee.
- Santo Fortunato and Mark E. J. Newman. 20 years of network community detection. Nature Physics, 18(8):848–850, 2022.
- Community detection in networks: A user guide. Physics Reports, 659:1–44, 2016.
- Tiago P. Peixoto. Parsimonious module inference in large networks. Physical Review Letters, 110(14):148701, 2013.
- Tiago P Peixoto. Reconstructing networks with unknown and heterogeneous errors. Physical Review X, 8(4):041011, 2018.
- Benchmark graphs for testing community detection algorithms. Physical Review E, 78(4):046110, 2008.
- Community detection in networks using graph embeddings. Physical Review E, 103(2):022316, 2021.
- Network community detection via neural embeddings. arXiv preprint arXiv:2306.13400, 2023.
- Iterative embedding and reweighting of complex networks reveals community structure. arXiv preprint arXiv:2402.10813, 2024.
- Element-centric clustering comparison unifies overlaps and hierarchy. Scientific Reports, 9(1):8574, 2019.
- Loops of any size and hamilton cycles in random scale-free networks. Journal of Statistical Mechanics: Theory and Experiment, 2005(06):P06005, 2005.
- Heterogeneous graph transformer. In Proceedings of the Web Conference 2020, pages 2704–2710, 2020.
- Embedding-aided network dismantling. Physical Review Research, 5(1):013076, 2023.
- Slowed canonical progress in large fields of science. Proceedings of the National Academy of Sciences, 118(41):e2021636118, 2021.