Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering (2403.19631v2)
Abstract: LLMs have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge, leading to potentially outdated or inaccurate responses. This problem becomes even more challenging when dealing with multi-hop questions, since they require LLMs to update and integrate multiple knowledge pieces relevant to the questions. To tackle the problem, we propose the Retrieval-Augmented model Editing (RAE) framework for multi-hop question answering. RAE first retrieves edited facts and then refines the LLM through in-context learning. Specifically, our retrieval approach, based on mutual information maximization, leverages the reasoning abilities of LLMs to identify chain facts that traditional similarity-based searches might miss. In addition, our framework includes a pruning strategy to eliminate redundant information from the retrieved facts, which enhances the editing accuracy and mitigates the hallucination problem. Our framework is supported by theoretical justification for its fact retrieval efficacy. Finally, comprehensive evaluation across various LLMs validates RAE's ability in providing accurate answers with updated knowledge. Our code is available at: https://github.com/sycny/RAE.
- The falcon series of open language models. arXiv preprint arXiv:2311.16867 (2023).
- Leonard E Baum and Ted Petrie. 1966. Statistical inference for probabilistic functions of finite state Markov chains. The annals of mathematical statistics 37, 6 (1966), 1554–1563.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. https://doi.org/10.5281/zenodo.5297715 If you use this software, please cite it using these metadata..
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
- Evaluating the ripple effects of knowledge editing in language models. arXiv preprint arXiv:2307.12976 (2023).
- Knowledge neurons in pretrained transformers. arXiv preprint arXiv:2104.08696 (2021).
- Multi-step entity-centric information retrieval for multi-hop question answering. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering. 113–118.
- Improving Sequential Model Editing with Fact Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2023. 11209–11224.
- Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors. arXiv preprint arXiv:2211.11031 (2022).
- Unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118 (2021).
- Great Truths are Always Simple: A Rather Simple Knowledge Encoder for Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models. arXiv preprint arXiv:2205.01841 (2022).
- Unikgqa: Unified retrieval and reasoning for solving multi-hop question answering over knowledge graph. arXiv preprint arXiv:2212.00959 (2022).
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
- The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models. arXiv preprint arXiv:2401.03205 (2024).
- HELMA: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. arXiv preprint arXiv:2305.11747 (2023).
- Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems 35 (2022), 17359–17372.
- Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229 (2022).
- Fast model editing at scale. arXiv preprint arXiv:2110.11309 (2021).
- Memory-based model editing at scale. In International Conference on Machine Learning. PMLR, 15817–15831.
- Answering Any-hop Open-domain Questions with Iterative Document Reranking. arXiv preprint arXiv:2009.07465 (2020).
- OpenAI. 2023a. GPT-3.5. https://openai.com/blog/gpt-3-5/. Accessed on [Date].
- OpenAI. 2023b. Models Referred to as GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5. Accessed on [Date].
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE. 1977. Speech Understanding Systems. Summary of Results of the Five-Year Research Effort at Carnegie-Mellon University.
- Combining Lexical and Dense Retrieval for Computationally Efficient Multi-hop Question Answering. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, Nafise Sadat Moosavi, Iryna Gurevych, Angela Fan, Thomas Wolf, Yufang Hou, Ana Marasović, and Sujith Ravi (Eds.). Association for Computational Linguistics, Virtual, 58–63. https://doi.org/10.18653/v1/2021.sustainlp-1.7
- Open domain question answering using early fusion of knowledge bases and text. arXiv preprint arXiv:1809.00782 (2018).
- Think-on-graph: Deep and responsible reasoning of large language model with knowledge graph. arXiv preprint arXiv:2307.07697 (2023).
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85.
- Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
- Retrieval-augmented Multilingual Knowledge Editing. arXiv preprint arXiv:2312.13040 (2023).
- Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases. arXiv preprint arXiv:2308.11761 (2023).
- DeepEdit: Knowledge Editing as Decoding with Constraints. arXiv preprint arXiv:2401.10471 (2024).
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
- Towards Personalized Cold-Start Recommendation with Prompts. arXiv preprint arXiv:2306.17256 (2023).
- An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080 (2021).
- Simple yet effective bridge reasoning for open-domain multi-hop question answering. arXiv preprint arXiv:1909.07597 (2019).
- Unsupervised alignment-based iterative evidence retrieval for multi-hop question answering. arXiv preprint arXiv:2005.01218 (2020).
- QA-GNN: Reasoning with language models and knowledge graphs for question answering. arXiv preprint arXiv:2104.06378 (2021).
- Investigating the Catastrophic Forgetting in Multimodal Large Language Models. arXiv preprint arXiv:2309.10313 (2023).
- Subgraph retrieval enhanced model for multi-hop knowledge base question answering. arXiv preprint arXiv:2202.13296 (2022).
- Can We Edit Factual Knowledge by In-Context Learning? arXiv preprint arXiv:2305.12740 (2023).
- MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions. arXiv preprint arXiv:2305.14795 (2023).
- Modifying memories in transformer models. arXiv preprint arXiv:2012.00363 (2020).