Emergent Mind

Abstract

While recent advancements in commercial LLMs (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving complex medical problems. To address this, we introduce Meerkat-7B, a novel medical AI system with 7 billion parameters. Meerkat-7B was trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. Our system achieved remarkable accuracy across seven medical benchmarks, surpassing GPT-3.5 by 13.1%, as well as outperforming the previous best 7B models such as MediTron-7B and BioMistral-7B by 13.4% and 9.8%, respectively. Notably, it surpassed the passing threshold of the United States Medical Licensing Examination (USMLE) for the first time for a 7B-parameter model. Additionally, our system offered more detailed free-form responses to clinical queries compared to existing 7B and 13B models, approaching the performance level of GPT-3.5. This significantly narrows the performance gap with large LMs, showcasing its effectiveness in addressing complex medical challenges.

Meerkat-7B models surpass 7B models, GPT-3.5, and even MediTron-70B in MedQA benchmark scores.

Overview

  • Meerkat-7B, a small language model with 7 billion parameters, marks a significant advancement in medical AI by excelling in diagnostics and decision-making tasks.

  • It surpasses previous models and challenges larger ones like GPT-3.5 by showing exceptional performance across seven medical benchmark datasets.

  • The model’s training includes Chain-of-Thought reasoning from medical textbooks and synthetic data, enhancing its reasoning skills and understanding of complex medical information.

  • Its development offers a scalable and open-source solution for AI-assisted medical diagnostics, narrowing the performance gap between small models and LLMs.

Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

Introduction to Meerkat-7B

The emergence of Meerkat-7B introduces a paradigm shift in the realm of medical AI, marking a significant advancement in leveraging the potential of smaller-scale models for complex problem-solving. Rooted in the novel approach of training on a combination of chain-of-thought (CoT) reasoning paths sourced from medical textbooks and synthetic dataset creation, Meerkat-7B pushes the boundaries of what small language models can achieve in medical diagnostics and decision-making tasks. With 7 billion parameters, Meerkat-7B not only excels in medical benchmark performances but also poses as a pioneering solution to the limitations imposed by the closed-source nature of commercial LLMs in sensitive fields such as healthcare.

Exceptional Benchmark Performance

Across a spectrum of seven medical benchmark datasets, Meerkat-7B has demonstrated remarkable proficiency, overshadowing its predecessors and even challenging the supremacy of significantly larger models like GPT-3.5. Specifically, Meerkat-7B achieved an average accuracy of 64.2%, presenting substantial improvements over GPT-3.5, MediTron-7B, and BioMistral-7B, by 13.1%, 13.4%, and 9.8% respectively. The model showcases its prowess in accurately addressing USMLE-style questions, exceeding the passing threshold with scores of 74.3% and 71.4% on MedQA and the USMLE Sample Test—benchmarks previously dominated by larger models.

Training with CoT and Synthetic Data

A cornerstone of Meerkat-7B's training regimen involves the integration of CoT reasoning paths from 18 expansive medical textbooks. This approach not only enriches the model's understanding and reasoning skills but also augments its capability to process and interpret complex medical information. Through fine-tuning on both MedQA-CoT and the newly introduced MedBooks-CoT-18 dataset, a synthetic collection of question-answer pairs, the model’s proficiency was significantly enhanced. This novel method of synthesizing data from authoritative sources has proven instrumental in bridging the gap between small models and their larger counterparts.

Implications and Future Prospects

The development and open-source availability of Meerkat-7B present promising avenues for advancing AI-assisted medical diagnostics and decision-making processes. By achieving superior performance in medical benchmarks, the model narrows the existing performance gap between small-scale models and LLMs, offering a viable alternative for applications requiring sensitive data handling. The innovative approach of leveraging CoT reasoning and synthetic dataset generation from textbooks exemplifies a scalable strategy for enhancing small models' capabilities. As Meerkat-7B continues to evolve, its integration with preference alignment techniques and the development of retrieval-augmented methodologies stand as potential future directions to further refine its performance and reliability in real-world medical applications.

In conclusion, Meerkat-7B epitomizes a significant stride towards democratizing access to high-performing medical AI systems, underscoring the feasibility of small models in tackling intricate challenges that were once the sole domain of LLMs. This breakthrough heralds a new era for AI in medicine, emphasizing the importance of innovative data utilization and model training strategies.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.