A Survey of Large Language Models for Graphs (2405.08011v3)
Abstract: Graphs are an essential data structure utilized to represent relationships in real-world scenarios. Prior research has established that Graph Neural Networks (GNNs) deliver impressive outcomes in graph-centric tasks, such as link prediction and node classification. Despite these advancements, challenges like data sparsity and limited generalization capabilities continue to persist. Recently, LLMs have gained attention in natural language processing. They excel in language comprehension and summarization. Integrating LLMs with graph learning techniques has attracted interest as a way to enhance performance in graph learning tasks. In this survey, we conduct an in-depth review of the latest state-of-the-art LLMs applied in graph learning and introduce a novel taxonomy to categorize existing methods based on their framework design. We detail four unique designs: i) GNNs as Prefix, ii) LLMs as Prefix, iii) LLMs-Graphs Integration, and iv) LLMs-Only, highlighting key methodologies within each category. We explore the strengths and limitations of each framework, and emphasize potential avenues for future research, including overcoming current integration challenges between LLMs and graph learning techniques, and venturing into new application areas. This survey aims to serve as a valuable resource for researchers and practitioners eager to leverage LLMs in graph learning, and to inspire continued progress in this dynamic field. We consistently maintain the related open-source materials at \url{https://github.com/HKUDS/Awesome-LLM4Graph-Papers}.
- When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding and Reasoning. arXiv preprint arXiv:2312.10372 (2023).
- William Brannon et al. 2023. Congrat: Self-supervised contrastive pretraining for joint graph and text embeddings. arXiv preprint arXiv:2305.14321 (2023).
- Instructmol: Multi-modal integration for building a versatile and reliable molecular assistant in drug discovery. arXiv preprint arXiv:2311.16208 (2023).
- Graphllm: Boosting graph reasoning ability of large language model. arXiv preprint arXiv:2310.05845 (2023).
- GraphWiz: An Instruction-Following Language Model for Graph Problems. arXiv (2024).
- LLaGA: Large Language and Graph Assistant. arXiv (2024).
- Zhikai Chen et al. 2024a. Exploring the potential of large language models (llms) in learning on graphs. ACM SIGKDD Explorations Newsletter (2024).
- Label-free node classification on graphs with large language models (llms). arXiv preprint arXiv:2310.04668 (2023).
- Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments. arXiv preprint arXiv:2403.08593 (2024).
- Which Modality should I use–Text, Motif, or Image?: Understanding Graphs with Large Language Models. arXiv (2023).
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Simteg: A frustratingly simple approach improves textual graph learning. arXiv preprint arXiv:2308.02565 (2023).
- Talk like a graph: Encoding graphs for large language models. arXiv (2023).
- Generalization and representational limits of graph neural networks. In ICML. PMLR, 3419–3430.
- Gpt4graph: Can large language models understand graph structured data? an empirical evaluation and benchmarking. arXiv preprint arXiv:2305.15066 (2023).
- GraphEdit: Large Language Models for Graph Structure Learning. arXiv preprint arXiv:2402.15183 (2024).
- Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation learning. In ICLR.
- Lightgcn: Simplifying and powering graph convolution network for recommendation. In SIGIR. 639–648.
- G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering. arXiv preprint arXiv:2402.07630 (2024).
- Yufei He and Bryan Hooi. 2024. UniGraph: Learning a Cross-Domain Graph Foundation Model From Natural Language. arXiv (2024).
- Open graph benchmark: Datasets for machine learning on graphs. NeurIPS 33 (2020), 22118–22133.
- Beyond Text: A Deep Dive into Large Language Models’ Ability on Understanding Graph Data. arXiv preprint arXiv:2310.04944 (2023).
- Prompt-based node feature extractor for few-shot learning on text-attributed graphs. arXiv preprint arXiv:2309.02848 (2023).
- Can GNN be Good Adapter for LLMs? arXiv preprint arXiv:2402.12984 (2024).
- Large language models on graphs: A comprehensive survey. arXiv preprint arXiv:2312.02783 (2023).
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
- GRENADE: Graph-Centric Language Model for Self-Supervised Representation Learning on Text-Attributed Graphs. arXiv preprint arXiv:2310.15109 (2023).
- A survey of graph meets large language model: Progress and future directions. arXiv preprint arXiv:2311.12399 (2023).
- ZeroG: Investigating Cross-dataset Zero-shot Transferability in Graphs. arXiv (2024).
- Urbangpt: Spatio-temporal large language models. arXiv preprint arXiv:2403.00813 (2024).
- One for all: Towards training one graph model for all classification tasks. In ICLR.
- Git-mol: A multi-modal large language model for molecular science with graph, image, and text. Computers in Biology and Medicine 171 (2024), 108073.
- Shengchao Liu et al. 2023a. Multi-modal molecule structure–text model for text-based retrieval and editing. Nature Machine Intelligence (2023).
- Yinhan Liu et al. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- Zhiyuan Liu et al. 2023b. Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. arXiv preprint arXiv:2310.12798 (2023).
- Linhao Luo et al. 2023. Reasoning on graphs: Faithful and interpretable large language model reasoning. arXiv (2023).
- Zihan Luo et al. 2024. GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability. arXiv (2024).
- Information network or social network? The structure of the Twitter follow graph. In WWW.
- Shirui Pan et al. 2024. Unifying large language models and knowledge graphs: A roadmap. TKDE (2024).
- Disentangled representation learning with large language models for text-attributed graphs. arXiv preprint arXiv:2310.18152 (2023).
- Colin Raffel et al. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR 21, 140 (2020), 1–67.
- Representation learning with large language models for recommendation. arXiv preprint arXiv:2310.15950 (2023).
- A molecular multimodal foundation model associating molecule graphs with natural language. arXiv (2022).
- MuseGraph: Graph-oriented Instruction Tuning of Large Language Models for Generic Graph Mining. arXiv (2024).
- Walklm: A uniform language model fine-tuning framework for attributed graph embedding. NeurIPS 36 (2024).
- Graphgpt: Graph instruction tuning for large language models. arXiv preprint arXiv:2310.13023 (2023).
- HiGPT: Heterogeneous Graph Language Model. arXiv preprint arXiv:2402.16024 (2024).
- Graph neural prompting with large language models. In AAAI, Vol. 38. 19080–19088.
- Hugo Touvron et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Petar Velickovic et al. 2017. Graph attention networks. stat (2017).
- Can language models solve graph problems in natural language? NeurIPS 36 (2024).
- InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment. arXiv (2024).
- Graph Agent: Explicit Reasoning Agent for Graphs. arXiv preprint arXiv:2310.16421 (2023).
- Llmrec: Large language models with graph augmentation for recommendation. In WSDM. 806–815.
- Zhihao Wen and Yuan Fang. 2023. Prompt tuning on graph-augmented low-resource text classification. arXiv preprint arXiv:2307.10230 (2023).
- Difformer: Scalable (graph) transformers induced by energy constrained diffusion. arXiv preprint arXiv:2301.09474 (2023).
- Nodeformer: A scalable graph structure learning transformer for node classification. Advances in Neural Information Processing Systems 35 (2022), 27387–27401.
- OpenGraph: Towards Open Graph Foundation Models. arXiv preprint arXiv:2403.01121 (2024).
- Han Xie et al. 2023. Graph-aware language model pre-training on a large graph corpus can help multiple graph applications. In KDD. 5270–5281.
- Natural language is all a graph needs. arXiv preprint arXiv:2308.07134 (2023).
- Graph contrastive learning automated. In ICML. PMLR, 12121–12132.
- Graph transformer networks. NeurIPS (2019).
- Mengmei Zhang et al. 2024. GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks. arXiv preprint arXiv:2402.07197 (2024).
- Xikun Zhang et al. 2022. Greaselm: Graph reasoning enhanced language models for question answering. arXiv (2022).
- Zeyang Zhang et al. 2023. LLM4DyG: Can Large Language Models Solve Problems on Dynamic Graphs? arXiv (2023).
- Haiteng Zhao et al. 2024. Gimlet: A unified graph-text model for instruction-based molecule zero-shot learning. NeurIPS 36 (2024).
- Learning on large-scale text-attributed graphs via variational inference. arXiv preprint arXiv:2210.14709 (2022).
- Graphtext: Graph reasoning in text space. arXiv preprint arXiv:2310.01089 (2023).
- Efficient Tuning and Inference for Large Language Models on Textual Graphs. arXiv preprint arXiv:2401.15569 (2024).
- Pretraining language models with text-attributed heterogeneous graphs. arXiv preprint arXiv:2310.12580 (2023).