Papers
Topics
Authors
Recent
2000 character limit reached

PMC-LLaMA: Towards Building Open-source Language Models for Medicine (2304.14454v3)

Published 27 Apr 2023 in cs.CL

Abstract: Recently, LLMs have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this paper, we describe the procedure for building a powerful, open-source LLM specifically designed for medicine applications, termed as PMC-LLaMA. Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation LLM towards medical domain, this involves data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning. This dataset encompasses medical question-answering (QA), rationale for reasoning, and conversational dialogues, comprising a total of 202M tokens; (iii) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component. While evaluating on various public medical question-answering benchmarks, our lightweight PMCLLaMA, which consists of only 13 billion parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, datasets can be found in https://github.com/chaoyi-wu/PMC-LLaMA.

Citations (64)

Summary

  • The paper introduces PMC-LLaMA, demonstrating a novel approach of integrating 4.8M biomedical papers and 30K medical textbooks for domain-specific performance.
  • The methodology combines data-centric knowledge injection with medical-specific instruction tuning to improve accuracy in medical QA, reasoning, and dialogue tasks.
  • Experimental evaluations reveal that PMC-LLaMA outperforms conventional models in both zero-shot and fine-tuned scenarios on benchmarks like PubMedQA, MedMCQA, and USMLE.

PMC-LLaMA: Towards Building Open-source LLMs for Medicine

The paper introduces PMC-LLaMA, an open-source LLM specifically designed for medical applications, addressing limitations in existing LLMs when dealing with domain-specific tasks, particularly in medicine. The model leverages two key processes—data-centric knowledge injection and medical-specific instruction tuning—to enhance the applicability and precision of LLMs within the medical domain.

Introduction to PMC-LLaMA

PMC-LLaMA is developed by adapting a general-purpose LLM for the medical domain. The adaptation process involves:

  1. Data-centric Knowledge Injection: Utilizing a large corpus of de-duplicated and pre-processed medical data, including 4.8 million biomedical papers and 30,000 medical textbooks, to imbue the model with domain-specific knowledge.
  2. Medical-specific Instruction Tuning: This process aligns the model with the requirements of domain-specific tasks through comprehensive fine-tuning using a large-scale medical dataset encompassing tasks such as medical QA, reasoning rationale, and conversational dialogues. Figure 1

    Figure 1: The training pipeline of PMC-LLaMA, demonstrating knowledge injection and instruction tuning stages.

The practical implementation of PMC-LLaMA involves careful injection of domain-specific knowledge, ensuring the model comprehends complex medical terminology and reasoning processes.

Knowledge Injection

The initial stage of training PMC-LLaMA includes a data-centric focus to expose the model to comprehensive medical information. This foundational training is driven by two primary data sources:

  • Biomedical Papers: Extracting cutting-edge medical insights, primarily using academic papers related to PubMed Central IDs.
  • Medical Textbooks: Amplifying the model's exposure to foundational knowledge by categorically integrating diverse medical specialties (Figure 2). Figure 2

    Figure 2: Distribution of medical textbooks categories, depicting the diversity of sources integrated into the model.

The knowledge injection aids in creating a robust embedding space for handling complex medical terminologies, essential for the model's effectiveness in precise medical scenarios.

Instruction Tuning

Following knowledge injection, PMC-LLaMA undergoes instruction tuning to refine its ability to process medical instructions accurately. This involves:

  • Medical Conversation Data: Using patient-physician dialogue datasets to simulate realistic interactions and responses.
  • Reasoning QA Data: Enhancing reasoning capabilities by integrating datasets that require detailed rationale beyond simple question answering.
  • Knowledge Graph Data: Using structured data to improve the model's familiarity and handling of explicit medical term definitions and relationships.

These processes allow PMC-LLaMA to operate proficiently in zero-shot scenarios, offering substantial improvements over existing models like ChatGPT in terms of accuracy and domain adaptation (Figure 3). Figure 3

Figure 3

Figure 3

Figure 3: Patient-Physician Conversation, showcasing PMC-LLaMA’s proficiency in practical dialogue settings.

Experimental Evaluation

PMC-LLaMA's performance evaluation spans multiple medical QA benchmarks such as PubMedQA, MedMCQA, and USMLE, using accuracy as the primary metric. Comparisons with existing models demonstrate PMC-LLaMA's superior performance, attributed to the detailed domain-specific tuning and extensive dataset integration.

  • Task-specific Fine-tuning Evaluation: Models without instruction tuning are further refined on medical QA datasets, showcasing marked improvements in specialized scenarios.
  • Zero-shot Instruction Evaluation: Demonstrates PMC-LLaMA's capability to generate accurate responses without additional fine-tuning, emphasizing its robust foundational knowledge in medicine.

Conclusion

PMC-LLaMA successfully addresses the limitations of generic LLMs in medical domains by integrating specific domain knowledge and instruction tuning, making it a versatile tool for handling intricate medical tasks. Its development sets a precedent for future LLM adaptations, particularly in critical fields requiring high precision and specialized knowledge. The release of PMC-LLaMA models, codes, and datasets (available at https://github.com/chaoyi-wu/PMC-LLaMA) offers an invaluable resource for further advancements in medical AI applications.

Overall, PMC-LLaMA exemplifies a significant step towards enhancing LLM capabilities within medicine, showcasing practical improvements that surpass conventional models in both scope and accuracy across medical-related tasks.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 11 tweets with 0 likes about this paper.