Emergent Mind

Training Language Models with Memory Augmentation

(2205.12674)
Published May 25, 2022 in cs.CL and cs.LG

Abstract

Recent work has improved language models (LMs) remarkably by equipping them with a non-parametric memory component. However, most existing approaches only introduce mem-ories at testing time or represent them using a separately trained encoder, resulting in suboptimal training of the language model. In this work, we present TRIME, a novel yet simple training approach designed for training LMs with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We also present new methods for memory construction and data batching, which are used for adapting to different sets of memories--local, long-term, and external memory--at testing time. We evaluate TRIME on multiple language modeling and machine translation benchmarks and show that it is able to achieve significant improvements across all the settings. Concretely, TRIME reduces the perplexity from 18.70 to 15.37 on WIKITEXT-103, by effectively leveraging a large memory set from the training corpus. Compared to standard LM training, TRIME adds negligible computational overhead and is compatible with different neural architectures, making it a versatile solution for training memory-augmented LMs.

Overview

  • This paper introduces TRIME, a novel approach for optimizing language models by integrating memory augmentation directly into the training process, enhancing the model's ability to use contextual information.

  • TRIME's core contribution is a unique training objective that leverages in-batch examples as accessible memory units, improving the model's interaction with different types of memories.

  • The model significantly outperformed existing approaches in empirical evaluations, particularly on the WikiText-103 dataset where it reduced the perplexity from 18.70 to 15.37.

  • TRIME advances the understanding of memory utilization in language models, suggesting a holistic approach to integrating memory mechanisms into the training process.

Training Language Models with Memory Augmentation: A Comprehensive Study

Introduction to Memory Augmentation in Language Models

Recent advancements in language models (LMs) have focused on integrating non-parametric memory components, enhancing the model's ability to capture and leverage contextual information from large datasets. This paper details TRIME (Training with In-batch Memories), a novel approach designed to optimize language models by integrating memory augmentation directly into the training process. Unlike traditional methods that incorporate memory units at the testing phase or use a separately trained encoder for memory representation, TRIME introduces a training objective and methods for memory construction and data batching that improve the model's interaction with local, long-term, and external memories during both training and testing.

Core Contributions and Methodology

The paper's central contribution lies in its unique training objective that leverages in-batch examples as accessible memory units. This objective is inspired by contrastive representation learning, aiming to align the hidden representation of the target token with both its embedding and a set of in-batch contextualized representations. This approach not only aids in handling rare words by falling back to word embeddings when in-batch memories do not contain the target token but also demonstrates the model's increased capacity to utilize contextual information over traditional language models.

Particularly notable is the paper's introduction of three memory types:

  • Local Memory: Reflects immediate past words modeled using attention mechanisms.
  • Long-term Memory: Captures context from the same document but outside the direct reach of attention due to input length constraints.
  • External Memory: Used to store vast amounts of data from the training corpus or additional datasets.

For each memory type, TRIME proposes innovative data batching strategies to efficiently construct and leverage these memories during training. The use of consecutive segments within a single batch allows the model to access long-term memories beyond its immediate context. Simultaneously, the batching of lexically similar segments from different documents as a proxy for external memory enhances the model's generalization capabilities.

Empirical Evaluations and Results

The TRIME model underwent extensive evaluation across multiple benchmarks, including language modeling and machine translation tasks. It significantly outperformed baseline models and existing approaches. For instance, on the WikiText-103 dataset, TRIME reduced the perplexity from 18.70 to 15.37 by efficiently utilizing large memory sets from the training corpus. This improvement was achieved with negligible computational overhead, underscoring TRIME's efficiency and scalability.

Theoretical Implications and Future Perspectives

Beyond its immediate performance gains, TRIME's approach opens new avenues for research into memory-augmented language models. By seamlessly integrating memory mechanisms into the training process, TRIME advances our understanding of how models can effectively leverage vast amounts of contextual data. It challenges the prevailing focus on post-hoc memory integration and standalone memory encoders, suggesting a more holistic approach to memory utilization in language models.

Conclusion

The TRIME model redefines the landscape of memory-augmented language modeling by embedding memory mechanisms directly into the training process. Its ability to harness local, long-term, and external memories without significant computational penalties marks a substantial step forward in the development of more efficient, context-aware language models. As such, TRIME not only achieves state-of-the-art results across several benchmarks but also lays the groundwork for future exploration of memory integration techniques in AI.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.