Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering (2403.04890v3)

Published 7 Mar 2024 in cs.CL

Abstract: In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. Additionally, we implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning, reaching a correct response to medical questions. We empirically demonstrate how CLINICR outperforms the state-of-the-art 5-shot CoT-based prompt (Li\'evin et al., 2022). We also present an approach that mirrors real-life clinical practice by first exploring multiple differential diagnoses through MCQ-CLINICR and subsequently narrowing down to a final diagnosis using MCQ-ELIMINATIVE. Finally, emphasizing the importance of response verification in medical settings, we utilize a reward model mechanism, replacing the elimination process performed by MCQ-ELIMINATIVE.

References (23)

Citations (18)

View on Semantic Scholar

Summary

The paper introduces an incremental chain-of-thought prompting strategy that mimics clinical diagnostic reasoning for open-ended medical questions.
The methodology leverages modified MedQA-USMLE datasets to demonstrate that incremental prompts outperform traditional Codex prompts in descriptive response settings.
The study’s innovations, including differential diagnosis generation, suggest valuable future applications in clinical decision support and AI-driven medical education.

Few-shot Chain-of-thought Driven Reasoning for Open-ended Medical Question Answering

Introduction

In the arena of healthcare, leveraging LLMs for medical question answering is emerging as a promising approach to aid medical professionals and students. The paper presents a methodical investigation into enhancing the efficacy of LLMs in answering open-ended medical questions. Distinctively, this paper shifts focus towards subjective response generation by developing a modified MedQA-USMLE dataset to mirror real-life clinical scenarios more accurately.

Methodology

A pivotal contribution of this work is the introduction of an advanced prompting strategy designed specifically for the medical domain, described as incremental reasoning prompts. Unlike traditional few-shot Codex prompts that often resort to eliminative reasoning, this strategy advocates for a forward-looking chain of thought (CoT) process, which aligns more closely with the clinical diagnostic process.

Key Differentiations and Dataset Modifications

The conventional Codex few-shot prompts and the newly proposed MedCodex few-shot prompts were employed and assessed against both the traditional MedQA dataset and a novel variant tailored to encourage descriptive responses.
The MedQA-USMLE dataset underwent substantial modifications to produce two distinct versions: one retaining its original multiple-choice question (MCQ) format (referred to as MedQA-Original) and another adapted for descriptive, open-ended questioning (MedQA-No-Opt). This adaptation was essential for simulating a more genuine clinical inquiry environment.

Results and Observations

The evaluation of the incremental reasoning prompt's effectiveness revealed nuanced performances across different scenarios:

When applied to the original MCQ-format dataset, the standard Codex prompting approach outperformed the incremental reasoning prompts. This disparity underscores the Codex pattern's proficiency in navigating the constrained choice space inherent in MCQs.
Conversely, the incremental reasoning prompts demonstrated a significant advantage over Codex prompts within the descriptive version of the dataset. The observed superiority highlights the importance of a more dynamic and holistic reasoning approach when confronting open-ended medical questions.

Furthermore, a novel experiment on differential diagnosis generation capitalizes on generating plausible options before employing either the Codex or a specialized verifier model for final answer selection. This innovative approach not only resonates with the clinical decision-making process but also showcased an enhanced performance, especially when integrated with the trained verifier model.

Implications and Future Directions

The paper's implications extend beyond enhancing LLMs' performance in medical question answering. By introducing and validating the incremental reasoning prompt strategy, the research opens pathways for developing more nuanced and context-aware LLM applications in healthcare. This approach could potentially refine LLMs’ utility in clinical decision support, patient education, and medical training.

Looking ahead, the paper suggests several avenues for continued exploration. Among them is the prospect of applying the verified rewarding mechanism on other LLMs beyond the Llama2 model tested. Additionally, expanding the application of the developed methods to a broader range of medical datasets could further validate the proposed approach's effectiveness and adaptability.

Conclusion

The paper's exploration into using few-shot, chain-of-thought driven reasoning to prompt LLMs for open-ended medical question answering contributes valuable insights into the potential for AI-driven tools in healthcare. The development of the modified MedQA dataset, alongside the introduction of a novel prompting strategy, lays foundational work for future research aimed at enhancing the precision and relevance of LLM outputs in medical contexts.

PDF Markdown

Related Papers

Tweets

https://twitter.com/wagieeacc/status/1767650844021846132