RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems (2403.09040v2)
Abstract: Retrieval-augmented generation (RAG) can significantly improve the performance of LLMs (LMs) by providing additional context for tasks such as document-based question answering (DBQA). However, the effectiveness of RAG is highly dependent on its configuration. To systematically find the optimal configuration, we introduce RAGGED, a framework for analyzing RAG configurations across various DBQA tasks. Using the framework, we discover distinct LM behaviors in response to varying context quantities, context qualities, and retrievers. For instance, while some models are robust to noisy contexts, monotonically performing better with more contexts, others are more noise-sensitive and can effectively use only a few contexts before declining in performance. This framework also provides a deeper analysis of these differences by evaluating the LMs' sensitivity to signal and noise under specific context quality conditions. Using RAGGED, researchers and practitioners can derive actionable insights about how to optimally configure their RAG systems for their specific question-answering tasks.
- Evidentiality-guided generation for knowledge-intensive nlp tasks.
- Longformer: The long-document transformer.
- Optimizing retrieval-augmented reader models via token elimination.
- Unlimiformer: Long-range transformers with unlimited length input. In Thirty-seventh Conference on Neural Information Processing Systems.
- Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879, Vancouver, Canada. Association for Computational Linguistics.
- Reading wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
- Benchmarking large language models in retrieval-augmented generation.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- An empirical analysis of compute-optimal large language model training. In Advances in Neural Information Processing Systems.
- Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering.
- Dense passage retrieval for open-domain question answering.
- Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over BERT. CoRR.
- Bioasq-qa: A manually curated corpus for biomedical question answering. Scientific Data, 10:170.
- Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- Lost in the middle: How language models use long contexts.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories.
- National Library of Medicine. 2023. Pubmed baseline 2023 repository.
- Kilt: a benchmark for knowledge intensive language tasks.
- Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
- Mctest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 193–203.
- The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
- Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063.
- Ul2: Unifying language learning paradigms. arXiv preprint arXiv:2205.05131.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Learning to filter context for retrieval-augmented generation. arXiv preprint arXiv:2311.08377.
- Recomp: Improving retrieval-augmented lms with compression and selective augmentation.
- Retrieval meets long context large language models.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering.
- Making retrieval-augmented language models robust to irrelevant context.
- Chain-of-note: Enhancing robustness in retrieval-augmented language models.