Emergent Mind

REALM: Retrieval-Augmented Language Model Pre-Training

(2002.08909)
Published Feb 10, 2020 in cs.CL and cs.LG

Abstract

Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.

Overview

  • REALM introduces a novel approach to language model pre-training by augmenting it with a learned textual knowledge retriever, enabling the leveraging of external documents for predictions.

  • The framework aims to address the limitations of traditional language models by offering a scalable and interpretable method of knowledge storage and recall through external documentation.

  • REALM demonstrates superior performance on Open-domain Question Answering benchmarks, showcasing its ability to effectively incorporate and leverage external knowledge.

  • The paper discusses the implications of REALM for future research, highlighting its potential for dynamic knowledge bases, exploration in various domains, and unsupervised alignments between learned representations and external knowledge.

Exploration of Retrieval-Augmented Language Model Pre-Training (REALM)

Introduction

The paper presents a novel framework for augmenting language model pre-training with a learned textual knowledge retriever, termed Retrieval-Augmented Language Model Pre-Training (REALM). It introduces an unsupervised method for pre-training a knowledge retriever alongside the language model. This contrasts with traditional language models like BERT, RoBERTa, and T5, which encapsulate knowledge implicitly within their parameters. REALM seeks to modularize knowledge storage, making it both interpretable and extensive, by leveraging external documents during prediction. The framework exhibits superior performance on Open-domain Question Answering (Open-QA) benchmarks, evidencing its capacity to effectively incorporate and leverage external world knowledge.

Background

The motivation behind REALM arises from the limitations of storage space within the network parameters of current language models. As these models are trained on extensive corpora, the encapsulated knowledge grows with network size, making it difficult to scale and interpret the stored information. The paper highlights the necessity for a more scalable and explicit method of knowledge storage and recall.

Approach

REALM decomposes the prediction of an output y given an input x into two distinct steps: retrieval and prediction. The framework uses a neural knowledge retriever to select relevant documents from a large corpus like Wikipedia and then employs a knowledge-augmented encoder to predict the output based on the input and retrieved documents. The model optimizes the marginal likelihood of this generative process, requiring adaptation of both the retriever and encoder through backpropagation. The real challenge and novel contribution lie in efficiently managing and backpropagating through the retrieval step, which involves a substantial corpus of millions of documents. This complexity is addressed through a sophisticated implementation utilizing Maximum Inner Product Search (MIPS) for efficient document retrieval and caching mechanisms.

Experiments and Results

REALM demonstrates outstanding performance when fine-tuned on Open-QA tasks, surpassing state-of-the-art models on popular benchmarks such as Natural Questions-Open, WebQuestions, and CuratedTrec, with improvements in absolute accuracy ranging between 4% to 16%. These results serve as a strong indicator of REALM's enhanced capability in incorporating and leveraging external knowledge effectively.

Implications and Future Directions

The demonstrated ability of REALM to utilize external documents in language model pre-training suggests several promising directions for future research. The modular knowledge approach opens up possibilities for dynamic knowledge bases that can be updated without retraining the model from scratch, enhancing the model's adaptability to new information. Furthermore, the successful integration of retrieval mechanisms in not only the inference phase but also during pre-training paves the way for exploration in other domains such as structured knowledge bases, multimedia data, and multilingual corpora.

Another intriguing aspect is the model-centric unsupervised alignments between the pre-training corpus and the knowledge corpus. These alignments offer a new lens through which to analyze and interpret the interactions between learned representations and external knowledge sources.

Summary

In sum, Retrieval-Augmented Language Model Pre-Training (REALM) marks a significant step forward in the unsupervised pre-training of language models. By combining the strengths of neural retrievers with the rich representational capabilities of modern language models, REALM not only pushes the boundaries of what's achievable in Open-QA but also opens new avenues for research in knowledge-intensive applications of AI. The framework's potential to leverage updated and diverse forms of external knowledge dynamically introduces a robust approach to tackling the challenges of scalability and adaptability in knowledge storage within neural networks.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.