Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

98 tokens/sec

GPT-4o

8 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text (2403.18421v1)

Published 27 Mar 2024 in cs.CL and cs.AI

Abstract: Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks. However, these models have hundreds of billions of parameters, are computationally expensive to run, require users to send their input data over the internet, and are trained on unknown data sources. Can smaller, more targeted models compete? To address this question, we build and release BioMedLM, a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles. When fine-tuned, BioMedLM can produce strong multiple-choice biomedical question-answering results competitive with much larger models, such as achieving a score of 57.3% on MedMCQA (dev) and 69.0% on the MMLU Medical Genetics exam. BioMedLM can also be fine-tuned to produce useful answers to patient questions on medical topics. This demonstrates that smaller models can potentially serve as transparent, privacy-preserving, economical and environmentally friendly foundations for particular NLP applications, such as in biomedicine. The model is available on the Hugging Face Hub: https://huggingface.co/stanford-crfm/BioMedLM.

References (74)

Citations (31)

View on Semantic Scholar

Summary

The paper introduces BioMedLM, a 2.7B parameter, domain-specific language model that achieves robust performance on biomedical NLP tasks.
The model employs a specialized PubMed-trained tokenizer and efficient training on 128 Nvidia A100 GPUs to optimize cost and computational resources.
The study highlights BioMedLM’s strengths in data privacy, accessibility, and reduced environmental impact compared to larger, generalist models.

BioMedLM: A Specialized LLM for Biomedical NLP Tasks

Introduction

In recent years, LLMs such as GPT-4 and Med-PaLM 2 have significantly advanced the field of NLP across various domains, including biomedicine. However, their vast size, proprietary nature, and resource-intensive demands pose serious practical limitations, especially for applications requiring data privacy, cost-effectiveness, and environmental sustainability. Addressing these challenges, the paper introduces BioMedLM, a 2.7 billion parameter model, specifically trained on PubMed abstracts and full articles. BioMedLM demonstrates competitive performance on biomedical NLP tasks, such as multiple-choice question-answering and patient-focused medical question generation, against its significantly larger counterparts.

Model Design and Training

BioMedLM is architected as a GPT-style autoregressive model, with a domain-specific tokenizer trained to efficiently handle biomedical terminology. Unlike large-scale general models, BioMedLM's training exclusively leverages PubMed data, aiming at improved efficiency in biomedical contexts without the computational and financial overheads associated with larger models. The training was executed on 128 40GB Nvidia A100 GPUs, demonstrating the feasibility of training and running medium-sized models on modest hardware configurations.

Evaluation on Biomedical Tasks

BioMedLM's performance was rigorously evaluated across a suite of biomedical question-answering tasks including MedMCQA, MedQA, MMLU, PubMedQA, and BioASQ. Notably, BioMedLM achieved a score of 57.3% on MedMCQA and 69.0% on the MMLU Medical Genetics exam, outperforming or closely rivaling models like GPT-Neo 2.7B and even some larger models on specific tasks. This reveals that a domain-specific focus during training can yield models with competitive task performance, while also being more accessible and practical for specialized applications.

Practical Implications and Future Directions

The paper underscores the capabilities of smaller, domain-focused models to meet or exceed the performance of larger, generalist models on specific tasks. BioMedLM's approach addresses several critical concerns in deploying NLP technologies in sensitive areas like healthcare:

Privacy and Security: With full training on publicly available PubMed data and the ability to run on local hardware, BioMedLM offers a transparent and secure alternative to proprietary models that require data transmission over the internet.
Cost and Accessibility: The training and inference efficiency of BioMedLM make it a feasible option for organizations with limited budgets, democratizing access to advanced NLP capabilities.
Environmental Impact: By demonstrating strong performance with significantly fewer parameters, BioMedLM presents an environmentally friendlier option compared to training and operating larger models.

Looking ahead, this work opens several avenues for future research, including the exploration of training techniques that further optimize performance and efficiency for domain-specific models. Additionally, extending the methodology to other specialized fields could yield similarly effective models across a broader range of disciplines.

Conclusion

BioMedLM exemplifies the potential of medium-sized, domain-focused models to achieve high performance on specialized tasks, challenging the prevailing assumption that larger models always perform better. By balancing efficiency with capability, BioMedLM represents a significant step forward in making advanced NLP technology more accessible, transparent, and sustainable, particularly in critical fields such as biomedicine.

PDF Markdown

Tweets

https://twitter.com/_akhaliq/status/1773171297620463957

https://twitter.com/arankomatsuzaki/status/1773172911282544989

https://twitter.com/BrianRoemmele/status/1773178912732033494

https://twitter.com/TheTuringPost/status/1775689762650935678

https://twitter.com/TheTuringPost/status/1775282094635471320

https://twitter.com/rkakamilan/status/1773943741910516103