GRAG: Graph Retrieval-Augmented Generation (2405.16506v2)
Abstract: Naive Retrieval-Augmented Generation (RAG) focuses on individual documents during retrieval and, as a result, falls short in handling networked documents which are very popular in many applications such as citation graphs, social media, and knowledge graphs. To overcome this limitation, we introduce Graph Retrieval-Augmented Generation (GRAG), which tackles the fundamental challenges in retrieving textual subgraphs and integrating the joint textual and topological information into LLMs to enhance its generation. To enable efficient textual subgraph retrieval, we propose a novel divide-and-conquer strategy that retrieves the optimal subgraph structure in linear time. To achieve graph context-aware generation, incorporate textual graphs into LLMs through two complementary views-the text view and the graph view-enabling LLMs to more effectively comprehend and utilize the graph context. Extensive experiments on graph reasoning benchmarks demonstrate that in scenarios requiring multi-hop reasoning on textual graphs, our GRAG approach significantly outperforms current state-of-the-art RAG methods.
- Beyond text: A deep dive into large language models’ ability on understanding graph data. In NeurIPS 2023 Workshop: New Frontiers in Graph Learning, 2023a.
- Exploring the potential of large language models (llms) in learning on graphs. ACM SIGKDD Explorations Newsletter, 25(2):42–61, 2024.
- Talk like a graph: Encoding graphs for large language models. In The Twelfth International Conference on Learning Representations, 2023.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In The 61st Annual Meeting Of The Association For Computational Linguistics, 2023.
- Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12076–12100, 2023.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Realm: retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, pages 3929–3938, 2020.
- Yixuan Tang and Yi Yang. Multihop-rag: Benchmarking retrieval-augmented generation for multi-hop queries. arXiv preprint arXiv:2401.15391, 2024.
- Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation learning. In The Twelfth International Conference on Learning Representations, 2023.
- Large language models on graphs: A comprehensive survey. arXiv preprint arXiv:2312.02783, 2023.
- A survey of graph meets large language model: Progress and future directions. arXiv preprint arXiv:2311.12399, 2023.
- Do large language models latently perform multi-hop reasoning? arXiv preprint arXiv:2402.16837, 2024.
- Computers and intractability: A guide to the theory of NP-completeness. WH Freeman, 1979.
- Label-free node classification on graphs with large language models (llms). In The Twelfth International Conference on Learning Representations, 2023.
- Enhancing knowledge graph construction using large language models. arXiv preprint arXiv:2305.04676, 2023.
- Exploring large language models for knowledge graph completion. arXiv preprint arXiv:2308.13916, 2023.
- Can language models solve graph problems in natural language? Advances in Neural Information Processing Systems, 36, 2024.
- Structgpt: A general framework for large language model to reason over structured data. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9237–9251, 2023.
- Reasoning on graphs: Faithful and interpretable large language model reasoning. In The Twelfth International Conference on Learning Representations, 2023.
- Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. In The 61st Annual Meeting Of The Association For Computational Linguistics, 2023.
- An empirical study of pre-trained language models in simple knowledge graph question answering. World Wide Web, 26(5):2855–2886, 2023b.
- Graph neural prompting with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19080–19088, 2024.
- G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. arXiv preprint arXiv:2402.07630, 2024.
- In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics, 11:1316–1331, 2023.
- Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023.
- Qa-gnn: Reasoning with language models and knowledge graphs for question answering. In North American Chapter of the Association for Computational Linguistics (NAACL), 2021.
- Knowledge graph-augmented language models for knowledge-grounded dialogue generation. arXiv preprint arXiv:2305.18846, 2023.
- Factkg: Fact verification via reasoning on knowledge graphs. In 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, pages 16190–16206. Association for Computational Linguistics (ACL), 2023.
- Contextual path retrieval: A contextual entity relation embedding-based approach. ACM Transactions on Information Systems, 41(1):1–38, 2023.
- Complex logical reasoning over knowledge graphs using large language models. arXiv preprint arXiv:2305.01157, 2023.
- From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024.
- Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, 2019.
- Pdftriage: Question answering over long, structured documents. arXiv preprint arXiv:2309.08872, 2023.
- Graph attention networks. In International Conference on Learning Representations, 2018.
- The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 201–206, 2016.
- Explagraphs: An explanation graph generation task for structured commonsense reasoning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7716–7740, 2021.
- The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.
- Language-agnostic bert sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 878–891, 2022.
- Unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118, 2021.
- Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147, 2022.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.