Emergent Mind


Large Language models (LLMs) have demonstrated significant potential in transforming healthcare by automating tasks such as clinical documentation, information retrieval, and decision support. In this aspect, carefully engineered prompts have emerged as a powerful tool for using LLMs for medical scenarios, e.g., patient clinical scenarios. In this paper, we propose a modified version of the MedQA-USMLE dataset, which is subjective, to mimic real-life clinical scenarios. We explore the Chain of Thought (CoT) reasoning based on subjective response generation for the modified MedQA-USMLE dataset with appropriate LM-driven forward reasoning for correct responses to the medical questions. Keeping in mind the importance of response verification in the medical setting, we utilize a reward training mechanism whereby the language model also provides an appropriate verified response for a particular response to a clinical question. In this regard, we also include human-in-the-loop for different evaluation aspects. We develop better in-contrast learning strategies by modifying the 5-shot-codex-CoT-prompt from arXiv:2207.08143 for the subjective MedQA dataset and developing our incremental-reasoning prompt. Our evaluations show that the incremental reasoning prompt performs better than the modified codex prompt in certain scenarios. We also show that greedy decoding with the incremental reasoning method performs better than other strategies, such as prompt chaining and eliminative reasoning.

Experiments on MedQA-no-opt dataset using Llama-2-7B-chat shown.


  • This paper explores the application of LLMs for answering open-ended medical questions through a new prompting strategy called incremental reasoning prompts.

  • A modified MedQA-USMLE dataset is introduced to better simulate real-life clinical scenarios and test the effectiveness of the incremental reasoning prompts.

  • The study demonstrates that the incremental reasoning prompts outperform traditional Codex prompts in generating descriptive responses to open-ended questions.

  • Future research directions include applying the approach to other LLMs and expanding it to a wider range of medical datasets.

Few-shot Chain-of-thought Driven Reasoning for Open-ended Medical Question Answering


In the arena of healthcare, leveraging LLMs for medical question answering is emerging as a promising approach to aid medical professionals and students. The paper presents a methodical investigation into enhancing the efficacy of LLMs in answering open-ended medical questions. Distinctively, this study shifts focus towards subjective response generation by developing a modified MedQA-USMLE dataset to mirror real-life clinical scenarios more accurately.


A pivotal contribution of this work is the introduction of an advanced prompting strategy designed specifically for the medical domain, described as incremental reasoning prompts. Unlike traditional few-shot Codex prompts that often resort to eliminative reasoning, this strategy advocates for a forward-looking chain of thought (CoT) process, which aligns more closely with the clinical diagnostic process.

Key Differentiations and Dataset Modifications

  • The conventional Codex few-shot prompts and the newly proposed MedCodex few-shot prompts were employed and assessed against both the traditional MedQA dataset and a novel variant tailored to encourage descriptive responses.
  • The MedQA-USMLE dataset underwent substantial modifications to produce two distinct versions: one retaining its original multiple-choice question (MCQ) format (referred to as MedQA-Original) and another adapted for descriptive, open-ended questioning (MedQA-No-Opt). This adaptation was essential for simulating a more genuine clinical inquiry environment.

Results and Observations

The evaluation of the incremental reasoning prompt's effectiveness revealed nuanced performances across different scenarios:

  • When applied to the original MCQ-format dataset, the standard Codex prompting approach outperformed the incremental reasoning prompts. This disparity underscores the Codex pattern's proficiency in navigating the constrained choice space inherent in MCQs.
  • Conversely, the incremental reasoning prompts demonstrated a significant advantage over Codex prompts within the descriptive version of the dataset. The observed superiority highlights the importance of a more dynamic and holistic reasoning approach when confronting open-ended medical questions.

Furthermore, a novel experiment on differential diagnosis generation capitalizes on generating plausible options before employing either the Codex or a specialized verifier model for final answer selection. This innovative approach not only resonates with the clinical decision-making process but also showcased an enhanced performance, especially when integrated with the trained verifier model.

Implications and Future Directions

The study's implications extend beyond enhancing LLMs' performance in medical question answering. By introducing and validating the incremental reasoning prompt strategy, the research opens pathways for developing more nuanced and context-aware LLM applications in healthcare. This approach could potentially refine LLMs’ utility in clinical decision support, patient education, and medical training.

Looking ahead, the paper suggests several avenues for continued exploration. Among them is the prospect of applying the verified rewarding mechanism on other LLMs beyond the Llama2 model tested. Additionally, expanding the application of the developed methods to a broader range of medical datasets could further validate the proposed approach's effectiveness and adaptability.


The paper's exploration into using few-shot, chain-of-thought driven reasoning to prompt LLMs for open-ended medical question answering contributes valuable insights into the potential for AI-driven tools in healthcare. The development of the modified MedQA dataset, alongside the introduction of a novel prompting strategy, lays foundational work for future research aimed at enhancing the precision and relevance of LLM outputs in medical contexts.

Create an account to read this summary for free:


Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.