Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge (2403.01432v5)
Abstract: LLMs (LMs) memorize a vast amount of factual knowledge, exhibiting strong performance across diverse tasks and domains. However, it has been observed that the performance diminishes when dealing with less-popular or low-frequency concepts and entities, for example in domain specific applications. The two prominent approaches to enhance the performance of LMs on low-frequent topics are: Retrieval Augmented Generation (RAG) and fine-tuning (FT) over synthetic data. This paper explores and evaluates the impact of RAG and FT on customizing LMs in handling low-frequency entities on question answering tasks. We conduct extensive experiments on twelve LMs of varying size and type and different fine tuning, data augmentation, and retrieval models. Our findings indicate that while FT boosts the performance across entities of varying popularity, RAG surpasses FT by a large margin particularly for least popular factual knowledge. Additionally, the success of both RAG and FT approaches is amplified by improving retrieval and data augmentation techniques. Fine tuning, while beneficial for small LMs, requires extensive resources. To address this issue, we propose the new Stimulus RAG approach that surpasses the effectiveness of fine tuning based approaches, thereby eliminating the need for the costly data augmentation and fine tuning step for enriching LMs with less popular factual knowledge. The code is available at \url{https://github.com/informagi/RAGvsFT}.
- Synthetic QA corpora generation with roundtrip consistency. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 6168–6173. Association for Computational Linguistics.
- Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, ACL 2023, Toronto, Canada, July 9-14, 2023, pages 41–46. Association for Computational Linguistics.
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. CoRR, abs/2310.11511.
- Expand, highlight, generate: Rl-driven document generation for passage reranking. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 10087–10099. Association for Computational Linguistics.
- Evaluating entity disambiguation and the role of popularity in retrieval-based NLP. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 4472–4485. Association for Computational Linguistics.
- Palm: Scaling language modeling with pathways. J. Mach. Learn. Res., 24:240:1–240:113.
- Conversational question answering on heterogeneous sources. In SIGIR ’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022, pages 144–154. ACM.
- Compmix: A benchmark for heterogeneous question answering. CoRR, abs/2306.12235.
- Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
- RAG vs fine-tuning: Pipelines, tradeoffs, and a case study on agriculture. CoRR, abs/2401.08406.
- Qlora: Efficient finetuning of quantized llms. CoRR, abs/2305.14314.
- Ameya Godbole and Robin Jia. 2023. Benchmarking long-tail generalization with likelihood splits. In Findings of the Association for Computational Linguistics: EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 933–953. Association for Computational Linguistics.
- Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6609–6625, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Unsupervised dense information retrieval with contrastive learning. Trans. Mach. Learn. Res., 2022.
- Challenges and applications of large language models. CoRR, abs/2307.10169.
- Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, ICML, volume 202 of Proceedings of Machine Learning Research, pages 15696–15707. PMLR.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 6769–6781. Association for Computational Linguistics.
- Realtime QA: what’s the answer right now? CoRR, abs/2207.13332.
- Unsupervised question answering by cloze translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 4896–4910. Association for Computational Linguistics.
- Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- PAQ: 65 million probably-asked questions and what you can do with them. Trans. Assoc. Comput. Linguistics, 9:1098–1115.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
- Fine-tuning llama for multi-stage text retrieval. CoRR, abs/2310.08319.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),, pages 9802–9822. Association for Computational Linguistics.
- Nonparametric masked language modeling. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 2097–2118. Association for Computational Linguistics.
- Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 12284–12314. Association for Computational Linguistics.
- A comprehensive overview of large language models. CoRR, abs/2307.06435.
- Fine-tuning or retrieval? comparing knowledge injection in llms. CoRR, abs/2312.05934.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
- Stephen E. Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3(4):333–389.
- Simple entity-centric questions challenge dense retrievers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 6138–6148. Association for Computational Linguistics.
- Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, pages 3784–3803. Association for Computational Linguistics.
- Data augmentation for conversational AI. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, pages 5220–5223. ACM.
- Head-to-tail: How knowledgeable are large language models (llm)? A.K.A. will llms replace knowledge graphs? CoRR, abs/2308.10168.
- BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
- Zephyr: Direct distillation of LM alignment. CoRR, abs/2310.16944.
- An empirical comparison of LM-based question and answer generation methods. In Findings of the Association for Computational Linguistics: ACL 2023, pages 14262–14272, Toronto, Canada. Association for Computational Linguistics.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 1–9. Association for Computational Linguistics.