Emergent Mind

Abstract

Despite their impressive performance on diverse tasks, LLMs (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 models and 4 augmentation methods on PopQA, our new open-domain QA dataset with 14k questions. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail. We then show that retrieval-augmented LMs largely outperform orders of magnitude larger LMs, while unassisted LMs remain competitive in questions about high-popularity entities. Based on those findings, we devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary. Experimental results show that this significantly improves models' performance while reducing the inference costs.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
References
  1. Nancy E Adams. 2015. Bloom’s taxonomy of cognitive learning objectives. Journal of the Medical Library Association.
  2. Evidentiality-guided generation for knowledge-intensive NLP tasks. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  3. MS MARCO: A human generated machine reading comprehension dataset
  4. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models.
  5. Improving language models by retrieving from trillions of tokens. In Proceedings of the 39th International Conference on Machine Learning.
  6. Language models are few-shot learners. In Advances in Neural Information Processing systems.
  7. Knowledgeable or educated guess? revisiting language models as knowledge bases. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.
  8. Quantifying memorization across neural language models
  9. Rich knowledge sources bring complex knowledge conflicts: Recalibrating models to reflect conflicting evidence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
  10. PaLM: Scaling language modeling with pathways
  11. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  12. You can’t pick your neighbors, or can you? when and how to rely on retrieval in the kNN-LM. In Findings of EMNLP.
  13. Paolo Ferragina and Ugo Scaiella. 2010. TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on Information and knowledge management.
  14. Entities as experts: Sparse memory access with entity supervision. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
  15. Efficient nearest neighbor language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
  16. Are large pre-trained language models leaking your personal information? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
  17. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research.
  18. Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics.
  19. Few-shot learning with retrieval augmented language models
  20. Temporalwiki: A lifelong benchmark for training and evaluating ever-evolving language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
  21. Language models (mostly) know what they know
  22. Large language models struggle to learn long-tail knowledge
  23. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
  24. Realtime QA: What’s the answer right now?
  25. Generalization through memorization: Nearest neighbor language models. In International Conference on Learning Representations.
  26. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics.
  27. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems.
  28. Entity-based knowledge conflicts in question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7052–7063, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  29. Nonparametric masked language model
  30. BBQ: A hand-built bias benchmark for question answering. In Findings of the Association for Computational Linguistics: ACL 2022.
  31. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing
  32. E-BERT: Efficient-yet-effective entity embeddings for BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020.
  33. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research.
  34. Impact of pretraining term frequencies on few-shot reasoning
  35. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
  36. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval.
  37. Simple entity-centric questions challenge dense retrievers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
  38. Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021.
  39. Recitation-augmented language models
  40. Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  41. Generate rather than retrieve: Large language models are strong context generators
  42. Glm-130b: An open bilingual pre-trained model
  43. Opt: Open pre-trained transformer language models

Show All 43