Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 48 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 473 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

From Beginner to Expert: Modeling Medical Knowledge into General LLMs (2312.01040v3)

Published 2 Dec 2023 in cs.CL

Abstract: Recently, LLM based AI systems have demonstrated remarkable capabilities in natural language understanding and generation. However, these models face a significant challenge when it comes to sensitive applications, such as reasoning over medical knowledge and answering medical questions in a physician-like manner. Prior studies attempted to overcome this challenge by increasing the model size (>100B) to learn more general medical knowledge, while there is still room for improvement in LLMs with smaller-scale model sizes (<100B). In this work, we start from a pre-trained general LLM model (AntGLM-10B) and fine-tune it from a medical beginner towards a medical expert (called AntGLM-Med-10B), which leverages a 3-stage optimization procedure, i.e., general medical knowledge injection, medical domain instruction tuning, and specific medical task adaptation. Our contributions are threefold: (1) We specifically investigate how to adapt a pre-trained general LLM in medical domain, especially for a specific medical task. (2) We collect and construct large-scale medical datasets for each stage of the optimization process. These datasets encompass various data types and tasks, such as question-answering, medical reasoning, multi-choice questions, and medical conversations. (3) Specifically for multi-choice questions in the medical domain, we propose a novel Verification-of-Choice approach for prompting engineering, which significantly enhances the reasoning ability of LLMs. Remarkably, by combining the above approaches, our AntGLM-Med-10B model can outperform the most of LLMs on PubMedQA, including both general and medical LLMs, even when these LLMs have larger model size.

Citations (9)

Summary

  • The paper introduces a three-stage optimization process that injects general medical knowledge, conducts domain-specific instruction tuning, and adapts the model for specific tasks.
  • It leverages diverse datasets and techniques like chain-of-thought prompting and LoRA tuning to refine AntGLM-10B into the specialized AntGLM-Med-10B model.
  • Experimental results demonstrate that the adapted model achieves an 80.6% accuracy on PubMedQA, outperforming larger LLMs in medical reasoning.

From Beginner to Expert: Modeling Medical Knowledge into General LLMs

The paper "From Beginner to Expert: Modeling Medical Knowledge into General LLMs" presents a methodology to adapt a general LLM for specific medical applications, starting from a general pre-trained model, AntGLM-10B, and fine-tuning it into a medical domain expert, referred to as AntGLM-Med-10B. This approach utilizes a structured, three-stage optimization process, which includes the injection of medical knowledge, domain-specific instruction tuning, and adaptation for specific medical tasks. This essay will dissect the methods, datasets, and techniques employed to achieve notable performance in medical domain tasks, specifically focusing on its application to the PubMedQA dataset.

Optimization Framework

The developed framework involves a three-stage optimization procedure designed to systematically introduce medical knowledge and reasoning capabilities into a general-purpose LLM.

Stage 1: General Medical Knowledge Injection

In the first stage, general medical knowledge is injected into the model through continual pre-training using a diverse set of medical datasets, including textbooks, knowledge graphs, question-answer pairs, exam questions, and articles. Figure 1

Figure 1: The 3-stage optimization procedure of AntGLM-Med-10B. Different data types and medical tasks are utilized to achieve competitive performance.

Stage 2: Medical Domain Instruction Tuning

The second stage focuses on enriching the LLM with medical task types by instruction tuning, utilizing datasets like PromptCBLUE, Chinese Examination datasets, and various QA datasets. This stage aims to incorporate task-specific knowledge into the LLM.

Stage 3: Specific Medical Task Adaptation

Finally, the model is adapted for specific medical tasks, using datasets such as PubMedQA. This stage employs novel techniques like Verification-of-Choice (VoC), which enhances reasoning ability by enabling the model to self-verify generated answers, providing better accuracy in answering medical multiple-choice questions. Figure 2

Figure 2: The detailed techniques for different optimization stages.

Model and Training Implementation

AntGLM-10B, serving as the base model, is built on the GLM architecture that merges auto-encoding and auto-regression techniques. The model goes through extensive continual pre-training to embed the medical dataset properly, ensuring the integration of foundational medical knowledge before task-specific instruction tuning and adaptation.

Training Specifications

The model uses advanced training strategies and tools such as chain-of-thought prompting, chain-of-verification, and LoRA tuning to optimize both generalization and specialization in medical tasks. Figure 3

Figure 3: A comparison example for Chain-of-Thought and Verification-of-Choice.

Experimental Results

The AntGLM-Med-10B model achieved competitive results on PubMedQA, with an accuracy of 80.6%. This performance surpasses several larger LLMs, proving the efficacy of the structured fine-tuning stages in effectively injecting medical knowledge.

Performance Analysis

The model's continuous performance improvement through the stages of optimization highlights the effectiveness of the three-stage training approach. By leveraging the knowledge-rich datasets and innovative prompting strategies, AntGLM-Med-10B confidently approaches, and sometimes surpasses, larger models in the domain of medical reasoning. Figure 4

Figure 4: Accuracy results on PubMedQA at different optimization stages.

Conclusion

The framework presented for adapting general LLMs to medical domain specialists, as demonstrated by the AntGLM-Med-10B model, exhibits significant promise for enhancing domain-specific reasoning capabilities. Through careful dataset selection and innovative instruction tuning, it managed not only to improve performance but also to scale down model requirements while maintaining competitive accuracy. Future research directions could explore further optimization and expansion into other specialized domains, leveraging the modular and scalable nature of this approach.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.