Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Adaptation Approaches for Nearest Neighbor Language Models (2211.07828v2)

Published 15 Nov 2022 in cs.CL

Abstract: Semi-parametric Nearest Neighbor LLMs ($k$NN-LMs) have produced impressive gains over purely parametric LMs, by leveraging large-scale neighborhood retrieval over external memory datastores. However, there has been little investigation into adapting such models for new domains. This work attempts to fill that gap and suggests the following approaches for adapting $k$NN-LMs -- 1) adapting the underlying LM (using Adapters), 2) expanding neighborhood retrieval over an additional adaptation datastore, and 3) adapting the weights (scores) of retrieved neighbors using a learned Rescorer module. We study each adaptation strategy separately, as well as the combined performance improvement through ablation experiments and an extensive set of evaluations run over seven adaptation domains. Our combined adaptation approach consistently outperforms purely parametric adaptation and zero-shot ($k$NN-LM) baselines that construct datastores from the adaptation data. On average, we see perplexity improvements of 17.1% and 16% for these respective baselines, across domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. ArXiv, abs/2106.10199.
  2. A neural probabilistic language model. Advances in neural information processing systems, 13.
  3. Improving language models by retrieving from trillions of tokens. In ICML.
  4. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 615–621, New Orleans, Louisiana. Association for Computational Linguistics.
  5. Wikimedia Foundation. Wikimedia downloads.
  6. Re2g: Retrieve, rerank, generate. ArXiv, abs/2207.06300.
  7. SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 70–79, Hong Kong, China. Association for Computational Linguistics.
  8. XL-sum: Large-scale multilingual abstractive summarization for 44 languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703, Online. Association for Computational Linguistics.
  9. Efficient nearest neighbor language models. arXiv preprint arXiv:2109.04212.
  10. Meta-learning the difference: Preparing large language models for efficient adaptation. ArXiv, abs/2207.03509.
  11. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  12. Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685.
  13. Efficient attentions for long document summarization.
  14. Matt Gardner Johannes Welbl, Nelson F. Liu. 2017. Crowdsourcing multiple choice science questions.
  15. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547.
  16. Nearest neighbor machine translation. ArXiv, abs/2010.00710.
  17. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172.
  18. Booksum: A collection of datasets for long-form narrative summarization.
  19. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. ArXiv, abs/2205.05638.
  20. Recurrent neural network based language model. In Interspeech, volume 2, pages 1045–1048. Makuhari.
  21. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745.
  22. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with bert. ArXiv, abs/1901.04085.
  23. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
  24. Adapterhub: A framework for adapting transformers. arXiv preprint arXiv:2007.07779.
  25. Improving language understanding by generative pre-training.
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  27. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.
  28. Adaptive semiparametric language models. Transactions of the Association for Computational Linguistics, 9:362–373.
Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube