Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 144 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 124 tok/s Pro
Kimi K2 210 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Generalization through Memorization: Nearest Neighbor Language Models (1911.00172v2)

Published 1 Nov 2019 in cs.CL

Abstract: We introduce $k$NN-LMs, which extend a pre-trained neural LLM (LM) by linearly interpolating it with a $k$-nearest neighbors ($k$NN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data. Applying this augmentation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15.79 - a 2.9 point improvement with no additional training. We also show that this approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge. Together, these results strongly suggest that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search is an effective approach for language modeling in the long tail.

Citations (736)

Summary

  • The paper's main contribution is introducing k-NN-LMs that combine neural language models with k-nearest neighbor retrieval to improve generalization.
  • It details a two-stage process that builds a memory of context embeddings and retrieves neighbors during inference to lower perplexity.
  • Experimental results on benchmarks demonstrate significant perplexity reductions and enhanced adaptation to novel and rare contexts.

Generalization through Memorization: Nearest Neighbor LLMs

The paper "Generalization through Memorization: Nearest Neighbor LLMs" by Khandelwal et al. explores the integration of Nearest Neighbor (k-NN) techniques into LMs to improve their ability to generalize. The authors present a hybrid approach that leverages the strengths of both traditional neural LLMs and memory-based methods.

Summary of Contributions

The primary contribution of this work is the introduction of k-NN-LMs, which augment a pre-trained neural LLM with a k-nearest neighbors retrieval mechanism. Specifically, the approach involves storing all training data in a key-value memory, where keys are representations of context embeddings and values are the next tokens. During inference, the model retrieves the nearest neighbors from this memory to inform its predictions.

Methodology

The k-NN-LM operates in two main stages:

  1. Memory Augmentation: The memory consists of key-value pairs constructed from the training dataset. Keys are high-dimensional vectors derived from the context representations, and values are the corresponding next tokens.
  2. Inference via Retrieval: At inference time, the model employs a retrieval mechanism to find the closest matching context embeddings in the memory. The retrieved tokens (values) are combined with the probability distribution of the base neural LLM to generate the final prediction.

Formally, the prediction of the k-NN-LM for the next token wiw_i given context hih_i is computed as a combination of the base LM probability PLM(wi∣hi)P_{LM}(w_i \mid h_i) and the probability PkNN(wi∣hi)P_{kNN}(w_i \mid h_i) derived from the retrieved neighbors.

Experimental Results

The authors conducted extensive experiments on standard language modeling benchmarks, including Wikitext-2 and Wikitext-103, demonstrating the efficacy of the proposed k-NN-LM. Key findings include:

  • Perplexity Reduction: The k-NN-LM achieves significant reductions in perplexity compared to strong baseline models. For instance, on the Wikitext-103 dataset, the k-NN-LM achieved a perplexity of 16.4, outperforming the previous state-of-the-art.
  • Adaptive Generalization: The model effectively adapts to novel contexts by leveraging the memory component, providing a robust mechanism for generalization through memorization. This is particularly evident in cases where the training data includes rare or outlier sequences.

Implications

The integration of k-NN retrieval mechanisms into LLMs has several noteworthy implications:

  • Enhanced Memory Capacity: By storing comprehensive representations of the training data, k-NN-LMs can recall and utilize specific contexts more effectively than traditional LMs.
  • Dynamic Adaptation: The model's ability to dynamically incorporate nearest neighbors during inference enables it to adapt to changes in data distribution without the need for retraining.
  • Scalability Concerns: While memory-based models show promise, they also pose challenges related to memory storage and retrieval efficiency, particularly for large-scale datasets.

Theoretical and Practical Considerations

From a theoretical perspective, the k-NN-LM represents a significant step toward bridging the gap between memory-based models and neural approaches. The model underscores the importance of balancing memorization with generalization in the design of LLMs.

Practically, the integration of k-NN mechanisms could inspire new directions in machine learning research, particularly in enhancing the adaptability and robustness of AI systems. Future work may explore more efficient memory retrieval techniques, as well as the application of k-NN-LMs across different domains and tasks.

Future Directions

Potential avenues for future research based on the findings of this paper include:

  • Memory Compression: Investigating techniques for compressing the memory storage to manage scalability issues.
  • Hybrid Architectures: Combining k-NN retrieval with advanced neural architectures such as transformers to further improve performance.
  • Transfer Learning: Assessing the effectiveness of k-NN-LMs in transfer learning scenarios where models are fine-tuned on different but related tasks.

In conclusion, the paper by Khandelwal et al. contributes a novel perspective to language modeling by incorporating k-NN methods, yielding impressive empirical results and opening new paths for research in generalization and memory integration in AI systems.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 6 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com