Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pitfalls in Link Prediction with Graph Neural Networks: Understanding the Impact of Target-link Inclusion & Better Practices (2306.00899v2)

Published 1 Jun 2023 in cs.LG, cs.IR, and cs.SI

Abstract: While Graph Neural Networks (GNNs) are remarkably successful in a variety of high-impact applications, we demonstrate that, in link prediction, the common practices of including the edges being predicted in the graph at training and/or test have outsized impact on the performance of low-degree nodes. We theoretically and empirically investigate how these practices impact node-level performance across different degrees. Specifically, we explore three issues that arise: (I1) overfitting; (I2) distribution shift; and (I3) implicit test leakage. The former two issues lead to poor generalizability to the test data, while the latter leads to overestimation of the model's performance and directly impacts the deployment of GNNs. To address these issues in a systematic way, we introduce an effective and efficient GNN training framework, SpotTarget, which leverages our insight on low-degree nodes: (1) at training time, it excludes a (training) edge to be predicted if it is incident to at least one low-degree node; and (2) at test time, it excludes all test edges to be predicted (thus, mimicking real scenarios of using GNNs, where the test data is not included in the graph). SpotTarget helps researchers and practitioners adhere to best practices for learning from graph data, which are frequently overlooked even by the most widely-used frameworks. Our experiments on various real-world datasets show that SpotTarget makes GNNs up to 15x more accurate in sparse graphs, and significantly improves their performance for low-degree nodes in dense graphs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Lada A Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social networks 25, 3 (2003), 211–230.
  2. Time and Space Complexity of Graph Convolutional Networks. Accessed on: Dec 31 (2021).
  3. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).
  4. How attentive are graph attention networks? arXiv preprint arXiv:2105.14491 (2021).
  5. Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891 (2018).
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  7. FakeEdge: Alleviate Dataset Shift in Link Prediction. arXiv preprint arXiv:2211.15899 (2022).
  8. Long range graph benchmark. arXiv preprint arXiv:2206.08164 (2022).
  9. On power-law relationships of the internet topology. ACM SIGCOMM computer communication review 29, 4 (1999), 251–262.
  10. Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
  11. Shahriar Golchin and Mihai Surdeanu. 2023. Time travel in llms: Tracing data contamination in large language models. arXiv preprint arXiv:2308.08493 (2023).
  12. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
  13. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 (2020), 22118–22133.
  14. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019).
  15. Gpt-gnn: Generative pre-training of graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1857–1867.
  16. Efficient and effective training of language and graph neural network models. arXiv preprint arXiv:2206.10781 (2022).
  17. Thomas N Kipf and Max Welling. 2016a. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
  18. Thomas N Kipf and Max Welling. 2016b. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).
  19. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.
  20. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 177–187.
  21. Mining of massive data sets. Cambridge university press.
  22. David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the twelfth international conference on Information and knowledge management. 556–559.
  23. Commonsense knowledge base completion with structural and semantic context. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 2925–2933.
  24. A survey of link prediction in complex networks. ACM computing surveys (CSUR) 49, 4 (2016), 1–33.
  25. Automatic differentiation in pytorch. (2017).
  26. Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search. (2022). arXiv:2206.06588
  27. struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 385–394.
  28. NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark. arXiv preprint arXiv:2310.18018 (2023).
  29. Investigating and mitigating degree-related biases in graph convoltuional networks. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1435–1444.
  30. Inductive relation prediction by subgraph reasoning. In International Conference on Machine Learning. PMLR, 9448–9457.
  31. Composition-based multi-relational graph convolutional networks. arXiv preprint arXiv:1911.03082 (2019).
  32. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315 (2019).
  33. Neural Common Neighbor with Completion for Link Prediction. arXiv preprint arXiv:2302.00890 (2023).
  34. Representation learning on graphs with jumping knowledge networks. In International conference on machine learning. PMLR, 5453–5462.
  35. Link prediction with persistent homology: An interactive view. In International Conference on Machine Learning. PMLR, 11659–11669.
  36. Position-aware graph neural networks. In International conference on machine learning. PMLR, 7134–7143.
  37. Repurpose open data to discover therapeutics for COVID-19 using deep learning. Journal of proteome research 19, 11 (2020), 4624–4636.
  38. Muhan Zhang and Yixin Chen. 2018. Link prediction based on graph neural networks. Advances in neural information processing systems 31 (2018).
  39. Muhan Zhang and Yixin Chen. 2019. Inductive matrix completion based on graph neural networks. arXiv preprint arXiv:1904.12058 (2019).
  40. Labeling trick: A theory of using graph neural networks for multi-node representation learning. Advances in Neural Information Processing Systems 34 (2021), 9061–9073.
  41. Distdgl: distributed graph neural network training for billion-scale graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE, 36–44.
  42. Don’t Make Your LLM an Evaluation Benchmark Cheater. arXiv preprint arXiv:2311.01964 (2023).
  43. Explore Spurious Correlations at the Concept Level in Language Models for Text Classification. arXiv preprint arXiv:2311.08648 (2023).
  44. TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning. arXiv preprint arXiv:2309.13885 (2023).
  45. Neural bellman-ford networks: A general graph neural network framework for link prediction. Advances in Neural Information Processing Systems 34 (2021), 29476–29490.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jing Zhu (50 papers)
  2. Yuhang Zhou (52 papers)
  3. Vassilis N. Ioannidis (34 papers)
  4. Shengyi Qian (17 papers)
  5. Wei Ai (48 papers)
  6. Xiang Song (34 papers)
  7. Danai Koutra (70 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.