MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory (2404.11672v3)
Abstract: While current LLMs perform well on many knowledge-related tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with memorizing rare events and with updating their memory as facts change over time. In addition, the uninterpretable nature of parametric memory makes it challenging to prevent hallucination. Model editing and augmenting LLMs with parameters specialized for memory are only partial solutions. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM's capabilities in using stored knowledge. Our experiments indicate that MemLLM enhances the LLM's performance and interpretability, in LLMing in general and knowledge-intensive tasks in particular. We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation. The project repository is publicly available at https://github.com/amodaresi/MemLLM
- Self-RAG: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511.
- Longformer: The long-document transformer. arXiv:2004.05150.
- Steven Bird and Edward Loper. 2004. NLTK: The natural language toolkit. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 214–217, Barcelona, Spain. Association for Computational Linguistics.
- Recurrent memory transformer. In Advances in Neural Information Processing Systems.
- Memory transformer. arXiv preprint arXiv:2006.11527.
- Walking down the memory maze: Beyond context limit through interactive reading. arXiv preprint arXiv:2310.05029.
- Lift yourself up: Retrieval-augmented text generation with self-memory. In Thirty-seventh Conference on Neural Information Processing Systems.
- Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar. Association for Computational Linguistics.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
- Editing factual knowledge in language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6491–6506, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
- Model editing can hurt general abilities of large language models. arXiv preprint arXiv:2401.04700.
- Realm: retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org.
- Camelot: Towards large language models with training-free consolidated associative memory. arXiv preprint arXiv:2402.13449.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8):1735–1780.
- Chatdb: Augmenting llms with databases as their symbolic memory. arXiv preprint arXiv:2306.03901.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
- Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research.
- TemporalWiki: A lifelong benchmark for training and evaluating ever-evolving language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6237–6250, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Repeat after me: Transformers are better than state space models at copying. arXiv preprint arXiv:2402.01032.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Active retrieval augmented generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7969–7992, Singapore. Association for Computational Linguistics.
- Large language models struggle to learn long-tail knowledge. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
- Realtime QA: What’s the answer right now? In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), San Diega, CA, USA.
- Same task, more tokens: the impact of input length on the reasoning performance of large language models. arXiv:2402.14848.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc.
- Enhancing large language model with self-controlled memory framework. arXiv preprint arXiv:2304.13343.
- Lost in the middle: How language models use long contexts. arXiv:2307.03172.
- Relational memory-augmented language models. Transactions of the Association for Computational Linguistics, 10:555–572.
- Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, 42(4):824–836.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9802–9822, Toronto, Canada. Association for Computational Linguistics.
- ∞\infty∞-former: Infinite memory transformer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5468–5485, Dublin, Ireland. Association for Computational Linguistics.
- On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906–1919, Online. Association for Computational Linguistics.
- Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 1003–1011, Suntec, Singapore. Association for Computational Linguistics.
- Memory-based model editing at scale. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 15817–15831. PMLR.
- Ret-llm: Towards a general read-write memory for large language models. arXiv preprint arXiv:2305.14322.
- Memgpt: Towards llms as operating systems. arXiv preprint arXiv:2310.08560.
- Generative agents: Interactive simulacra of human behavior. In In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23), UIST ’23, New York, NY, USA. Association for Computing Machinery.
- How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, Online. Association for Computational Linguistics.
- Toolformer: Language models can teach themselves to use tools. In Thirty-seventh Conference on Neural Information Processing Systems.
- Large language models can be easily distracted by irrelevant context. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
- Editable neural networks. In International Conference on Learning Representations.
- Linformer: Self-attention with linear complexity. arXiv:2006.04768.
- Augmenting language models with long-term memory. In Thirty-seventh Conference on Neural Information Processing Systems.
- Memoryllm: Towards self-updatable large language models. arXiv preprint arXiv:2402.04624.
- Interactive natural language processing. arXiv preprint arXiv:2305.13246.
- Memformer: A memory-augmented transformer for sequence modeling. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 308–318, Online only. Association for Computational Linguistics.
- Memorizing transformers. In International Conference on Learning Representations.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations.
- DocRED: A large-scale document-level relation extraction dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 764–777, Florence, Italy. Association for Computational Linguistics.
- Editing large language models: Problems, methods, and opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10222–10240, Singapore. Association for Computational Linguistics.
- Generate rather than retrieve: Large language models are strong context generators. In The Eleventh International Conference on Learning Representations.
- Recurrentgpt: Interactive generation of (arbitrarily) long text. arXiv preprint arXiv:2305.13304.