Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions (2408.00727v3)

Published 1 Aug 2024 in cs.CL and cs.AI

Abstract: The emergent abilities of LLMs have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may still fail in complex cases where multiple rounds of information-seeking are required. To address such an issue, we propose iterative RAG for medicine (i-MedRAG), where LLMs can iteratively ask follow-up queries based on previous information-seeking attempts. In each iteration of i-MedRAG, the follow-up queries will be answered by a conventional RAG system and they will be further used to guide the query generation in the next iteration. Our experiments show the improved performance of various LLMs brought by i-MedRAG compared with conventional RAG on complex questions from clinical vignettes in the United States Medical Licensing Examination (USMLE), as well as various knowledge tests in the Massive Multitask Language Understanding (MMLU) dataset. Notably, our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5, achieving an accuracy of 69.68% on the MedQA dataset. In addition, we characterize the scaling properties of i-MedRAG with different iterations of follow-up queries and different numbers of queries per iteration. Our case studies show that i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions. To the best of our knowledge, this is the first-of-its-kind study on incorporating follow-up queries into medical RAG. The implementation of i-MedRAG is available at https://github.com/Teddy-XiongGZ/MedRAG.

Authors (6)

Guangzhi Xiong (18 papers)
Qiao Jin (74 papers)
Xiao Wang (508 papers)
Minjia Zhang (54 papers)
Zhiyong Lu (113 papers)
Aidong Zhang (49 papers)

Citations (8)

View on Semantic Scholar

Summary

The paper introduces i-MedRAG, a framework that uses iterative follow-up queries to refine information retrieval in medical QA.
It employs a multi-round query generation and retrieval process that achieves 69.68% accuracy on MedQA benchmarks.
The study demonstrates improved performance over traditional RAG methods and highlights strategies for scaling and cost reduction.

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions

The paper "Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions" by Xiong et al. introduces a novel approach to enhancing retrieval-augmented generation (RAG) systems in the medical domain. The proposed iterative RAG (i-MedRAG) framework aims to address the limitations of existing RAG models when handling complex medical questions that require multiple rounds of information-seeking and reasoning.

Summary of Contributions

The paper presents three primary contributions:

Introduction of i-MedRAG: The paper proposes a novel RAG architecture that incorporates iterative follow-up queries, allowing LLMs to seek additional information in a step-by-step manner. This approach helps in handling complex reasoning tasks where single-round information retrieval is insufficient.
Empirical Validation: The proposed i-MedRAG demonstrates superior performance on multiple medical question-answering benchmarks, including the United States Medical Licensing Examination (USMLE) subset of MedQA and medical tasks from the Massive Multitask Language Understanding (MMLU) dataset. Notably, the zero-shot i-MedRAG achieves a state-of-the-art accuracy of 69.68% on the MedQA dataset with GPT-3.5, outperforming all existing prompt engineering and fine-tuning methods.
Performance Analysis: The authors analyze the scaling properties of i-MedRAG, showing how its performance varies with different numbers of iterations and queries per iteration. The case studies illustrate the framework's efficacy in breaking down complex clinical questions and forming reasoning chains by flexibly generating follow-up queries.

Methodology

Iterative RAG (i-MedRAG)

The i-MedRAG framework enhances the traditional RAG setup by allowing the model to iteratively generate and answer follow-up queries based on initial and progressively gathered information. Each iteration involves the following steps:

Query Generation: The LLM generates multiple related queries to seek additional information.
Document Retrieval and Answer Generation: Each generated query is processed by a vanilla RAG system to retrieve relevant documents and generate corresponding answers.
Information Aggregation: The retrieved query-answer pairs are used to augment the context for the original question in subsequent iterations.

This iterative process continues until a predefined number of iterations are completed, accumulating a comprehensive collection of relevant information to address the original question more effectively.

Evaluation and Results

The evaluation results underscore the effectiveness of i-MedRAG:

Improvement over Vanilla RAG: Compared to vanilla RAG systems, i-MedRAG consistently improves performance across multiple medical QA benchmarks, particularly excelling in complex, multi-step reasoning questions.
State-of-the-Art Performance on MedQA: The i-MedRAG achieves a remarkable accuracy of 69.68% on the MedQA dataset, which contains clinical vignettes from the USMLE. This represents a significant improvement over notable previous methods such as MedRAG and various prompt engineering techniques.
Generalizability: The approach was tested across different LLMs, including GPT-3.5 and Llama-3.1-8B, and demonstrated robust performance improvements on both MedQA and MMLU-Med datasets.

Practical and Theoretical Implications

The practical implications of i-MedRAG are notable in the context of medical question answering systems. The ability to dynamically generate follow-up queries enables more thorough understanding and accurate responses to complex medical scenarios. From a theoretical standpoint, the introduction of an iterative retrieval and generation process represents a significant enhancement in leveraging LLMs for complex reasoning tasks. This approach could potentially be generalized to other domains requiring intricate information synthesis, such as legal, scientific, or technical question answering.

Future Directions

Future work could focus on reducing the computational costs associated with the iterative querying process. The paper also suggests possible improvements through few-shot learning and further optimization of hyperparameters to enhance the system's efficiency and effectiveness. Another intriguing direction involves the automation of hyperparameter selection and the exploration of dynamic adjustment strategies based on the complexity of the questions being answered.

Conclusion

The i-MedRAG framework introduced by Xiong et al. represents a substantial advancement in the application of LLMs to medical question answering. By iteratively generating and answering follow-up queries, i-MedRAG addresses key limitations of existing RAG systems, achieving state-of-the-art performance on challenging medical QA datasets. This paper sets a precedent for future research in enhancing the capabilities of LLMs through iterative and dynamic information retrieval and reasoning processes.

Related Papers

Tweets

https://twitter.com/GuangzhiXiong/status/1819557984415985992

https://twitter.com/OpenlifesciAI/status/1822091419848237387

https://twitter.com/OpenlifesciAI/status/1835085893847142746

https://twitter.com/GptMaestro/status/1820583818786697310