Emergent Mind


Retrieval-augmented language models pose a promising alternative to standard language modeling. During pretraining, these models search in a corpus of documents for contextually relevant information that could aid the language modeling objective. We introduce an 'ideal retrieval' methodology to study these models in a fully controllable setting. We conduct an extensive evaluation to examine how retrieval augmentation affects the behavior of the underlying language model. Among other things, we observe that these models: i) save substantially less world knowledge in their weights, ii) are better at understanding local context and inter-word dependencies, but iii) are worse at comprehending global context.

The figure shows effects of lacking retrieval pretraining in a base model.


  • The paper investigates the impact of retrieval-augmented pretraining on language models, showing while there is a detriment to world knowledge retention, there is improvement in syntactic processing.

  • An 'ideal retrieval' scenario was tested to analyze how retrieval quality affects language processing independently, using different levels of retrieval noise.

  • Findings indicate a trade-off in language models between storing factual world knowledge and syntactic enhancement, with broader language understanding capabilities declining in tasks requiring extensive internal reasoning.

Exploration of Retrieval-Augmented Pretraining for Language Models


Retrieval-augmented language models leverage both self-supervised learning and external information retrieval to enhance their ability to generate contextually relevant responses. These models integrate a nonparametric memory in the form of data retrieval during the token prediction process in training, which theoretically aids the model by providing additional context from a knowledge database. Several studies have shown the efficacy of these models in specific tasks like open-domain question answering. However, their impact on the core functionalities and behaviors of the underlying language models, when isolated from the retrieval components, is less studied.


The paper introduces a structured methodology to evaluate the intrinsic capabilities of language models trained with retrieval augmentation, using a controlled setting. The authors propose an "ideal retrieval" scenario where retrieval is simulated using paraphrases, enabling a cleaner analysis by removing the variability that comes with different retrieval mechanisms or databases. This approach allows for an examination of the impact of pure retrieval augmentation on language processing, independent of the quality of the retrieval data. The models tested include variations with different levels of retrieval noise (0%, 25%, 50%) to simulate varying levels of retrieval quality.


World Knowledge

Models trained with retrieval augmentation demonstrated lowered performance in tasks related to world knowledge, such as cloze tests from LAMA, indicating these models store less world factual information in their weights. The degradation was more significant as the retrieval noise decreased, suggesting an inverse relationship between retrieval reliance and onboard world knowledge retention.

Syntactic Knowledge

In contrast, syntactic understanding showed consistent improvement across models trained with retrieval augmentation. This enhancement in syntactical tasks indicates that the parameter space within the model, which would otherwise accommodate world knowledge, may be reallocating for better syntactic processing.

Language Understanding

The evaluation pointed to a decline in broader NLU capabilities, especially in tasks requiring the comprehension of extended contexts such as in GLUE and LAMBADA benchmarks. This decline suggests that while retrieval augmentation can offload some memory requirements to external databases, doing so may impair the model's ability to integrate and reason over longer texts internally.

Implications and Future Directions

The observed trade-off between world knowledge retention and syntactic processing efficiency raises critical considerations for the design of retrieval-augmented systems, particularly for applications requiring robust comprehension over extended contexts. The results suggest that while retrieval augmentation can optimize models for specific functionalities, such as syntax parsing, it may not be suitable for tasks requiring extensive internal reasoning and knowledge integration.

Future research could extend these findings by exploring different configurations of retrieval-augmented systems and their impacts on a broader range of linguistic and cognitive capabilities in language models. Additionally, studies could investigate the scaling effects of these models to understand how these dynamics play out in larger, more complex systems.

Practical and Theoretical Contributions

From a practical standpoint, these insights could guide the development of more specialized language models that either focus on efficient syntactic processing or comprehensive world-knowledge retention based on the needs of the application. Theoretically, this work contributes to our understanding of how external memory aids, such as retrieval systems, interact with the intrinsic learning capabilities of neural models, potentially paving the way for more modular and adaptable AI systems.


Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.