Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks (2404.00376v2)

Published 30 Mar 2024 in cs.CL

Abstract: While recent advancements in commercial LLMs (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving complex medical problems. To address this, we introduce Meerkat, a new family of medical AI systems ranging from 7 to 70 billion parameters. The models were trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. Our systems achieved remarkable accuracy across six medical benchmarks, surpassing the previous best models such as MediTron and BioMistral, and GPT-3.5 by a large margin. Notably, Meerkat-7B surpassed the passing threshold of the United States Medical Licensing Examination (USMLE) for the first time for a 7B-parameter model, while Meerkat-70B outperformed GPT-4 by an average of 1.3%. Additionally, Meerkat-70B correctly diagnosed 21 out of 38 complex clinical cases, outperforming humans' 13.8 and closely matching GPT-4's 21.8. Our systems offered more detailed free-form responses to clinical queries compared to existing small models, approaching the performance level of large commercial models. This significantly narrows the performance gap with large LMs, showcasing its effectiveness in addressing complex medical challenges.

Citations (11)

Summary

  • The paper demonstrates that integrating chain-of-thought reasoning with synthetic medical data significantly improves diagnostic accuracy in small language models.
  • It details a training regimen using data from 18 medical textbooks and synthetic CoT sets, achieving an average accuracy of 64.2% across seven benchmarks.
  • The model outperforms larger systems like GPT-3.5 on USMLE-style tests, marking a significant advancement in open-source medical AI.

Small LLMs Learn Enhanced Reasoning Skills from Medical Textbooks

Introduction to Meerkat-7B

The emergence of Meerkat-7B introduces a paradigm shift in the field of medical AI, marking a significant advancement in leveraging the potential of smaller-scale models for complex problem-solving. Rooted in the novel approach of training on a combination of chain-of-thought (CoT) reasoning paths sourced from medical textbooks and synthetic dataset creation, Meerkat-7B pushes the boundaries of what small LLMs can achieve in medical diagnostics and decision-making tasks. With 7 billion parameters, Meerkat-7B not only excels in medical benchmark performances but also poses as a pioneering solution to the limitations imposed by the closed-source nature of commercial LLMs in sensitive fields such as healthcare.

Exceptional Benchmark Performance

Across a spectrum of seven medical benchmark datasets, Meerkat-7B has demonstrated remarkable proficiency, overshadowing its predecessors and even challenging the supremacy of significantly larger models like GPT-3.5. Specifically, Meerkat-7B achieved an average accuracy of 64.2%, presenting substantial improvements over GPT-3.5, MediTron-7B, and BioMistral-7B, by 13.1%, 13.4%, and 9.8% respectively. The model showcases its prowess in accurately addressing USMLE-style questions, exceeding the passing threshold with scores of 74.3% and 71.4% on MedQA and the USMLE Sample Test—benchmarks previously dominated by larger models.

Training with CoT and Synthetic Data

A cornerstone of Meerkat-7B's training regimen involves the integration of CoT reasoning paths from 18 expansive medical textbooks. This approach not only enriches the model's understanding and reasoning skills but also augments its capability to process and interpret complex medical information. Through fine-tuning on both MedQA-CoT and the newly introduced MedBooks-CoT-18 dataset, a synthetic collection of question-answer pairs, the model’s proficiency was significantly enhanced. This novel method of synthesizing data from authoritative sources has proven instrumental in bridging the gap between small models and their larger counterparts.

Implications and Future Prospects

The development and open-source availability of Meerkat-7B present promising avenues for advancing AI-assisted medical diagnostics and decision-making processes. By achieving superior performance in medical benchmarks, the model narrows the existing performance gap between small-scale models and LLMs, offering a viable alternative for applications requiring sensitive data handling. The innovative approach of leveraging CoT reasoning and synthetic dataset generation from textbooks exemplifies a scalable strategy for enhancing small models' capabilities. As Meerkat-7B continues to evolve, its integration with preference alignment techniques and the development of retrieval-augmented methodologies stand as potential future directions to further refine its performance and reliability in real-world medical applications.

In conclusion, Meerkat-7B epitomizes a significant stride towards democratizing access to high-performing medical AI systems, underscoring the feasibility of small models in tackling intricate challenges that were once the sole domain of LLMs. This breakthrough heralds a new era for AI in medicine, emphasizing the importance of innovative data utilization and model training strategies.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 10 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube