Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 161 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 41 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 115 tok/s Pro
Kimi K2 219 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Retrieval-Augmented Generation for Large Language Models: A Survey (2312.10997v5)

Published 18 Dec 2023 in cs.CL and cs.AI

Abstract: LLMs showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval, the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces up-to-date evaluation framework and benchmark. At the end, this article delineates the challenges currently faced and points out prospective avenues for research and development.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (111)
  1. Neuro-symbolic language modeling with automaton-augmented retrieval. In International Conference on Machine Learning, pages 468–485. PMLR, 2022.
  2. Lingua: Addressing scenarios for live interpretation and automatic dubbing. In Janice Campbell, Stephen Larocca, Jay Marciano, Konstantin Savenkov, and Alex Yanishevsky, editors, Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), pages 202–209, Orlando, USA, September 2022. Association for Machine Translation in the Americas.
  3. AngIE. Angle-optimized text embeddings. https://github.com/SeanLee97/AnglE, 2023.
  4. Gar-meets-rag paradigm for zero-shot information retrieval. arXiv preprint arXiv:2310.20158, 2023.
  5. Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts), pages 41–46, 2023.
  6. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511, 2023.
  7. BAAI. Flagembedding. https://github.com/FlagOpen/FlagEmbedding, 2023.
  8. Knowledge-augmented language model verification. arXiv preprint arXiv:2310.12836, 2023.
  9. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022.
  10. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
  11. Optimizing retrieval-augmented reader models via token elimination. arXiv preprint arXiv:2310.13682, 2023.
  12. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439, 2020.
  13. Vladimir Blagojevi. Enhancing rag pipelines in haystack: Introducing diversityranker and lostinthemiddleranker. https://towardsdatascience.com/enhancing-rag-pipelines-in-haystack-45f14e2bc9f5, 2023.
  14. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR, 2022.
  15. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  16. Neural machine translation with monolingual translation memory. arXiv preprint arXiv:2105.11269, 2021.
  17. Using external off-policy speech-to-text mappings in contextual end-to-end automated speech recognition. arXiv preprint arXiv:2301.02736, 2023.
  18. Open-domain question answering. In Proceedings of the 58th annual meeting of the association for computational linguistics: tutorial abstracts, pages 34–37, 2020.
  19. Walking down the memory maze: Beyond context limit through interactive reading. arXiv preprint arXiv:2310.05029, 2023.
  20. Benchmarking large language models in retrieval-augmented generation. arXiv preprint arXiv:2309.01431, 2023.
  21. Neural machine translation with contrastive translation memories. arXiv preprint arXiv:2212.03140, 2022.
  22. Uprise: Universal prompt retrieval for improving zero-shot evaluation. arXiv preprint arXiv:2303.08518, 2023.
  23. Lift yourself up: Retrieval-augmented text generation with self memory. arXiv preprint arXiv:2305.02437, 2023.
  24. Boolq: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044, 2019.
  25. Cohere. Say goodbye to irrelevant search results: Cohere rerank is here. https://txt.cohere.com/rerank/, 2023.
  26. Promptagator: Few-shot dense retrieval from 8 examples. arXiv preprint arXiv:2209.11755, 2022.
  27. Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217, 2023.
  28. Retrieval-generation synergy augmented large language models. arXiv preprint arXiv:2310.05149, 2023.
  29. Trends in integration of knowledge and large language models: A survey and taxonomy of methods, benchmarks, and applications. arXiv preprint arXiv:2311.05876, 2023.
  30. Precise zero-shot dense retrieval without relevance labels. arXiv preprint arXiv:2212.10496, 2022.
  31. Robust retrieval augmented generation for zero-shot slot filling. arXiv preprint arXiv:2108.13934, 2021.
  32. Google. Gemini: A family of highly capable multimodal models. https://goo.gle/GeminiPaper, 2023.
  33. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
  34. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299, 2022.
  35. A survey of techniques for maximizing llm performance. https://community.openai.com/t/openai-dev-day-2023-breakout-sessions/505213#a-survey-of-techniques-for-maximizing-llm-performance-2, 2023.
  36. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736, 2023.
  37. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023.
  38. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696–15707. PMLR, 2023.
  39. Knowledge graph-augmented language models for knowledge-grounded dialogue generation. arXiv preprint arXiv:2305.18846, 2023.
  40. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020.
  41. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172, 2019.
  42. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020.
  43. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXiv preprint arXiv:2212.14024, 2022.
  44. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466, 2019.
  45. Learning dense representations of phrases at scale. arXiv preprint arXiv:2012.12624, 2020.
  46. Best practices for llm evaluation of rag applications. https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG, 2023.
  47. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  48. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  49. From classification to generation: Insights into crosslingual retrieval augmented icl. arXiv preprint arXiv:2311.06595, 2023.
  50. Chain of knowledge: A framework for grounding large language models with structured knowledge bases. arXiv preprint arXiv:2305.13269, 2023.
  51. Structure-aware language model pretraining improves dense retrieval on structured data. arXiv preprint arXiv:2305.19912, 2023.
  52. Ra-dit: Retrieval-augmented dual instruction tuning. arXiv preprint arXiv:2310.01352, 2023.
  53. Scatter: selective context attentional scene text recognizer. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11962–11972, 2020.
  54. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023.
  55. Jerry Liu. Building production-ready rag applications. https://www.ai.engineer/summit/schedule/building-production-ready-rag-applications, 2023.
  56. Augmented large language models with parametric knowledge guiding. arXiv preprint arXiv:2305.04757, 2023.
  57. Query rewriting for retrieval-augmented large language models. arXiv preprint arXiv:2305.14283, 2023.
  58. Large language model is not a good few-shot information extractor, but a good reranker for hard samples! ArXiv, abs/2303.08559, 2023.
  59. Ret-llm: Towards a general read-write memory for large language models. arXiv preprint arXiv:2305.14322, 2023.
  60. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  61. Retrieval-based prompt selection for code-related few-shot learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2450–2462, 2023.
  62. OpenAI. Gpt-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf, 2023.
  63. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
  64. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  65. Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732, 2015.
  66. Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249–266, 2019.
  67. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.
  68. Ares: An automated evaluation framework for retrieval-augmented generation systems. arXiv preprint arXiv:2311.09476, 2023.
  69. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  70. Simple entity-centric questions challenge dense retrievers. arXiv preprint arXiv:2109.08535, 2021.
  71. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. arXiv preprint arXiv:2305.15294, 2023.
  72. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652, 2023.
  73. Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567, 2021.
  74. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
  75. Recitation-augmented language models. arXiv preprint arXiv:2210.01296, 2022.
  76. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  77. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509, 2022.
  78. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  79. Open-set recognition: A good closed-set classifier is all you need? arXiv preprint arXiv:2110.06207, 2021.
  80. VoyageAI. Voyage’s embedding models. https://docs.voyageai.com/embeddings/, 2023.
  81. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32, 2019.
  82. Training data is more valuable than you think: A simple and effective method by retrieving from training data. arXiv preprint arXiv:2203.08773, 2022.
  83. Training data is more valuable than you think: A simple and effective method by retrieving from training data. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3170–3179, Dublin, Ireland, May 2022. Association for Computational Linguistics.
  84. Shall we pretrain autoregressive language models with retrieval? a comprehensive study. arXiv preprint arXiv:2304.06762, 2023.
  85. Query2doc: Query expansion with large language models. arXiv preprint arXiv:2303.07678, 2023.
  86. Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases. arXiv preprint arXiv:2308.11761, 2023.
  87. Self-knowledge guided retrieval augmentation for large language models. arXiv preprint arXiv:2310.05002, 2023.
  88. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  89. Graph based translation memory for neural machine translation. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 7297–7304, 2019.
  90. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408, 2023.
  91. Retrieval meets long context large language models. arXiv preprint arXiv:2310.03025, 2023.
  92. Vid2seq: Large-scale pretraining of a visual language model for dense video captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10714–10726, 2023.
  93. Prca: Fitting black-box large language models for retrieval question answering via pluggable reward-driven contextual adapter. arXiv preprint arXiv:2310.18347, 2023.
  94. Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224, 2023.
  95. Llm lies: Hallucinations are not bugs, but features as adversarial examples. arXiv preprint arXiv:2310.01469, 2023.
  96. Retrieval-augmented multimodal language modeling. arXiv preprint arXiv:2211.12561, 2022.
  97. Coreferential reasoning learning for language representation. arXiv preprint arXiv:2004.06870, 2020.
  98. Making retrieval-augmented language models robust to irrelevant context. arXiv preprint arXiv:2310.01558, 2023.
  99. Generate rather than retrieve: Large language models are strong context generators. arXiv preprint arXiv:2209.10063, 2022.
  100. Chain-of-note: Enhancing robustness in retrieval-augmented language models. arXiv preprint arXiv:2311.09210, 2023.
  101. Augmentation-adapted retriever improves generalization of language models as generic plug-in. arXiv preprint arXiv:2305.17331, 2023.
  102. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019.
  103. Retrieve anything to augment large language models. arXiv preprint arXiv:2310.07554, 2023.
  104. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023.
  105. Jiawei Zhang. Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by chatgpt. arXiv preprint arXiv:2304.11116, 2023.
  106. Generating synthetic speech from spokenvocab for speech translation. arXiv preprint arXiv:2210.08174, 2022.
  107. Take a step back: Evoking reasoning via abstraction in large language models. arXiv preprint arXiv:2310.06117, 2023.
  108. Training language models with memory augmentation. arXiv preprint arXiv:2205.12674, 2022.
  109. Visualize before you write: Imagination-guided open-ended text generation. arXiv preprint arXiv:2210.03765, 2022.
  110. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107, 2023.
  111. Open-source large language models are strong zero-shot query likelihood models for document ranking. arXiv preprint arXiv:2310.13243, 2023.
Citations (920)

Summary

  • The paper introduces RAG by integrating document retrieval with LLMs to reduce hallucinations and update information in real time.
  • It outlines the evolution from Naive to Modular RAG, emphasizing enhanced retriever precision and modularity in processing pipelines.
  • The study compares RAG with fine-tuning and prompt engineering, highlighting future challenges in multimodal applications and retrieval efficiency.

Retrieval-Augmented Generation for LLMs: A Survey

Introduction to RAG

Retrieval-Augmented Generation (RAG) represents a significant advancement in enhancing the capabilities of LLMs by integrating traditional LLM methodologies with information retrieval techniques. This framework addresses the limitations faced by LLMs, such as hallucinations and outdated information, through the retrieval of relevant document chunks from external databases. By doing so, RAG not only enhances the accuracy and credibility of generated responses but also enables real-time knowledge updates and domain-specific integrations. Figure 1

Figure 1: Technology tree of RAG research elaborating the focus on pre-training, fine-tuning, and inference, with an emphasis on leveraging in-context learning abilities of LLMs.

RAG Paradigms

The paper of RAG encompasses the evolution from the Naive RAG paradigm to Advanced RAG, culminating in the Modular RAG framework. Each paradigm represents a progressive enhancement over its predecessors.

  1. Naive RAG: This initial approach integrates basic indexing, retrieval, and generation processes. By retrieving relevant context from external sources and combining it with LLM capabilities, Naive RAG establishes a baseline for subsequent developments.
  2. Advanced RAG: This paradigm introduces optimization strategies for both pre-retrieval and post-retrieval processes. It focuses on enhancing retrieval precision, query optimization, and document relevance, thus addressing the inadequacies found in the Naive RAG.
  3. Modular RAG: Developing from its predecessors, Modular RAG provides flexibility and adaptability, incorporating specific modules for search, memory, and routing processes. This model allows for a more nuanced interaction between retrieval and generation components, facilitating iterative and adaptive retrieval strategies. Figure 2

    Figure 2: Comparison of RAG paradigms from Naive to Modular, highlighting process enhancements and structural flexibility.

Implementation of RAG in LLMs

RAG's core application has been in enriching the performance of LLMs, such as GPT-3, by enhancing their knowledge retrieval abilities. This process involves several foundational steps:

  • Indexing: Documents are split into chunks, encoded into vectors, and stored in a vector database (Figure 3). This foundational step is crucial for the efficient similarity-based retrieval of context.
  • Retrieval: Top-k relevant document chunks are retrieved based on their semantic similarity to the posed query. This retrieval process is fine-tuned to ensure the most relevant information is selected for subsequent generation.
  • Generation: The retrieved chunks are merged with the original query and input into an LLM, which then generates the final response. This method allows the system to integrate new and contextually accurate pieces of information into its responses. Figure 3

    Figure 3: A representative instance of the RAG process applied to question answering, detailing indexing, retrieval, and generation steps.

Comparative Analysis with Other Methods

RAG methods are contrasted with other optimization techniques such as Fine-Tuning and Prompt Engineering. While RAG focuses on augmenting the models' comprehension through external knowledge, Fine-Tuning involves adapting the model through additional training, and Prompt Engineering relies on exploiting the model's inherent capabilities with minimal modifications. The integration of these methods into RAG's Modular framework has facilitated the incorporation of more sophisticated retrieval techniques, offering a more comprehensive solution for knowledge-intensive tasks. Figure 4

Figure 4: RAG compared with other model optimization methods, highlighting the balance between external knowledge requirements and model adaptability.

Future Directions and Challenges

The survey identifies ongoing challenges within RAG systems, emphasizing the need for more robust retrieval mechanisms and enhanced integration strategies. As RAG continues to evolve, its application is expanding into multimodal domains, integrating not only textual but also visual and auditory data to provide comprehensive, contextually enriched responses. Future research is suggested to focus on improving retrieval efficiency, developing evaluation benchmarks, and exploring hybrid methods that leverage both parameterized and non-parameterized knowledge.

Conclusion

Retrieval-Augmented Generation for LLMs represents a crucial development in natural language processing, providing a means to overcome the inherent limitations of standalone LLMs. Through the progressive enhancements encapsulated in the Naive, Advanced, and Modular RAG paradigms, this technology has established a framework for integrating contextual knowledge into AI models. The ongoing research and development in this area promise further improvements in the versatility and applicability of LLMs across various domains, ultimately contributing to the advancement of AI-driven technologies. Figure 5

Figure 5: A comprehensive summary of the RAG ecosystem, illustrating interconnected developments and applications.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 85 tweets and received 2013 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com