Papers
Topics
Authors
Recent
2000 character limit reached

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory (2404.11672v3)

Published 17 Apr 2024 in cs.CL

Abstract: While current LLMs perform well on many knowledge-related tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with memorizing rare events and with updating their memory as facts change over time. In addition, the uninterpretable nature of parametric memory makes it challenging to prevent hallucination. Model editing and augmenting LLMs with parameters specialized for memory are only partial solutions. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM's capabilities in using stored knowledge. Our experiments indicate that MemLLM enhances the LLM's performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation. The project repository is publicly available at https://github.com/amodaresi/MemLLM

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Self-RAG: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511.
  2. Longformer: The long-document transformer. arXiv:2004.05150.
  3. Steven Bird and Edward Loper. 2004. NLTK: The natural language toolkit. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 214–217, Barcelona, Spain. Association for Computational Linguistics.
  4. Recurrent memory transformer. In Advances in Neural Information Processing Systems.
  5. Memory transformer. arXiv preprint arXiv:2006.11527.
  6. Walking down the memory maze: Beyond context limit through interactive reading. arXiv preprint arXiv:2310.05029.
  7. Lift yourself up: Retrieval-augmented text generation with self-memory. In Thirty-seventh Conference on Neural Information Processing Systems.
  8. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar. Association for Computational Linguistics.
  9. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  10. Editing factual knowledge in language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6491–6506, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  11. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
  12. Model editing can hurt general abilities of large language models. arXiv preprint arXiv:2401.04700.
  13. Realm: retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org.
  14. Camelot: Towards large language models with training-free consolidated associative memory. arXiv preprint arXiv:2402.13449.
  15. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8):1735–1780.
  16. Chatdb: Augmenting llms with databases as their symbolic memory. arXiv preprint arXiv:2306.03901.
  17. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  18. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research.
  19. TemporalWiki: A lifelong benchmark for training and evaluating ever-evolving language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6237–6250, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  20. Repeat after me: Transformers are better than state space models at copying. arXiv preprint arXiv:2402.01032.
  21. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
  22. Mistral 7b. arXiv preprint arXiv:2310.06825.
  23. Active retrieval augmented generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7969–7992, Singapore. Association for Computational Linguistics.
  24. Large language models struggle to learn long-tail knowledge. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
  25. Realtime QA: What’s the answer right now? In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  26. Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), San Diega, CA, USA.
  27. Same task, more tokens: the impact of input length on the reasoning performance of large language models. arXiv:2402.14848.
  28. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc.
  29. Enhancing large language model with self-controlled memory framework. arXiv preprint arXiv:2304.13343.
  30. Lost in the middle: How language models use long contexts. arXiv:2307.03172.
  31. Relational memory-augmented language models. Transactions of the Association for Computational Linguistics, 10:555–572.
  32. Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, 42(4):824–836.
  33. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9802–9822, Toronto, Canada. Association for Computational Linguistics.
  34. ∞\infty∞-former: Infinite memory transformer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5468–5485, Dublin, Ireland. Association for Computational Linguistics.
  35. On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906–1919, Online. Association for Computational Linguistics.
  36. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011, Suntec, Singapore. Association for Computational Linguistics.
  37. Memory-based model editing at scale. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 15817–15831. PMLR.
  38. Ret-llm: Towards a general read-write memory for large language models. arXiv preprint arXiv:2305.14322.
  39. Memgpt: Towards llms as operating systems. arXiv preprint arXiv:2310.08560.
  40. Generative agents: Interactive simulacra of human behavior. In In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23), UIST ’23, New York, NY, USA. Association for Computing Machinery.
  41. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, Online. Association for Computational Linguistics.
  42. Toolformer: Language models can teach themselves to use tools. In Thirty-seventh Conference on Neural Information Processing Systems.
  43. Large language models can be easily distracted by irrelevant context. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
  44. Editable neural networks. In International Conference on Learning Representations.
  45. Linformer: Self-attention with linear complexity. arXiv:2006.04768.
  46. Augmenting language models with long-term memory. In Thirty-seventh Conference on Neural Information Processing Systems.
  47. Memoryllm: Towards self-updatable large language models. arXiv preprint arXiv:2402.04624.
  48. Interactive natural language processing. arXiv preprint arXiv:2305.13246.
  49. Memformer: A memory-augmented transformer for sequence modeling. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 308–318, Online only. Association for Computational Linguistics.
  50. Memorizing transformers. In International Conference on Learning Representations.
  51. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations.
  52. DocRED: A large-scale document-level relation extraction dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 764–777, Florence, Italy. Association for Computational Linguistics.
  53. Editing large language models: Problems, methods, and opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10222–10240, Singapore. Association for Computational Linguistics.
  54. Generate rather than retrieve: Large language models are strong context generators. In The Eleventh International Conference on Learning Representations.
  55. Recurrentgpt: Interactive generation of (arbitrarily) long text. arXiv preprint arXiv:2305.13304.
Citations (6)

Summary

  • The paper introduces a method that integrates a structured explicit memory module into LLMs to improve retrieval of infrequent and time-sensitive information.
  • It leverages relational triples and vector similarity indexing to manage dynamic memory-read and memory-write operations efficiently.
  • Evaluation on the DOCRED dataset shows reduced perplexity and balanced recall-precision, enhancing factual adherence and reducing hallucinations.

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

The paper "MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory" (2404.11672) introduces a method for augmenting LLMs with a structured explicit memory module. This approach aims to address limitations in current LLMs related to the handling of infrequent knowledge and temporal degradation due to the reliance on implicit parametric storage.

Motivation and Background

While LLMs are proficient in numerous knowledge-intensive tasks, they often fall short when tasked with recalling infrequent or temporally sensitive information. Prior methods such as parametric memory pools, model editing, and retrieval-augmented generation (RAG) have highlighted some challenges. However, these methods encounter limitations, such as parametric distortions, interpretability issues, and inefficient retrieval processes. In contrast, the proposed MAuLLM method integrates a structured read-write memory component allowing efficient and dynamic interaction, thus aiming to enhance both performance and interpretability.

Methodology

The core innovation of MAuLLM is the integration of a structured memory component which the LLM can interact with explicitly. This memory is formatted as relational triples, which facilitates the LLM’s understanding and manipulation of stored information. The memory operations are controlled via a dedicated API enabling both memory-read and memory-write commands during inference.

Memory Structure

The structured memory is organized as triples: es,t,eo\langle e_s, t, e_o \rangle, where ese_s is the subject, tt is the relation, and eoe_o is the object. The system relies on efficient retrieval techniques using vector similarities to locate relevant memory entries, leveraging known methods like Hierarchical Navigable Small World graph-based indexing.

Figure 1

Figure 1: The prompt for the distant supervision dataset filtering. This prompt includes the natural representation of the relation, the reasoning, and the final answer.

Memory-read and Memory-write Operations

  • Memory Write: The LLM assesses each sentence in the text for potential relational information to store. Each sentence undergoes processing to extract relevant triples, which are then written to the memory using a defined API command.
  • Memory Read: When generating text, the model can invoke memory-read operations to query relevant stored information. These queries are dynamically generated in response to the context, and the retrieved information aids in more factually accurate text generation.

Evaluation

The performance of MAuLLM was assessed using the DOCRED dataset, known for its comprehensive relation annotations derived from Wikipedia. In experiments, MAuLLM demonstrated improved performance over models without explicit memory features, particularly in handling entity-related perplexities. Notably, using augmented memories correctly improved target entity prediction, indicating better factual adherence and reduced hallucination risks.

Performance Analysis:

  • Perplexity Reduction: MAuLLM achieved significant reductions in perplexity on entity targets, showcasing the READ/WRITE memory's efficacy in improving LLM accuracy.
  • Recall and Precision: By refining the distant supervision input and filtering strategies, the model achieved a balanced gain in recall without a loss in precision, allowing for richer relational context processing.

Conclusions and Future Work

MAuLLM represents a robust step towards more capable and interpretable LLMs by incorporating a structured memory mechanism. This explicit memory format not only allows for scalable knowledge management but also enhances the model's adaptability across tasks requiring reliable knowledge retention over time.

Future research directions could involve adapting this framework to more diverse knowledge-domain applications or exploring alternate relational schemas that support complex or nested knowledge structures, extending the current methodology to broader and more dynamic real-world contexts.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 27 likes about this paper.