SoftTiger: A Clinical Foundation Model for Healthcare Workflows (2403.00868v3)
Abstract: We introduce SoftTiger, a clinical LLM (CLaM) designed as a foundation model for healthcare workflows. The narrative and unstructured nature of clinical notes is a major obstacle for healthcare intelligentization. We address a critical problem of structuring clinical notes into clinical data, according to international interoperability standards. We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter. We then supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data. The training is orchestrated in a way that the target model can first support basic clinical tasks such as abbreviation expansion and temporal information extraction, and then learn to perform more complex downstream clinical tasks. Moreover, we address several modeling challenges in the healthcare context, e.g., extra long context window. Our blind pairwise evaluation shows that SoftTiger outperforms other popular open-source models and GPT-3.5, comparable to Gemini-pro, with a mild gap from GPT-4. We believe that LLMs may become a step-stone towards healthcare digitalization and democratization. Therefore, we publicly release SoftTiger models at scales of 13 billion and 70 billion parameters, as well as datasets and code for our innovative scalable evaluation, hopefully, making a significant contribution to the healthcare industry.
- A.Maria Nancy, R. 2020. A Review on Ununstructured Data in Medical Data. Journal of Critical Reviews, 7.
- The global health workforce stock and distribution in 2020 and 2030: a threat to equity and ‘universal’ health coverage? BMJ Global Health.
- Language Models are Few-Shot Learners. arXiv:2005.14165v4 [cs.CL].
- TigerBot: An Open Multilingual Multitask LLM. arXiv:2312.08688 [cs.CL].
- MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. arXiv:2311.16079 [cs.CL].
- MIMIC-IV, a freely accessible electronic health record dataset. Nature Scientific Data.
- Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes. arXiv:2309.00237 [cs.CL].
- Medical error—the third leading cause of death in the US. National Library of Medicine.
- Microsoft. 2023. Megatron-DeepSpeed. GitHub repository.
- Understanding the perceived role of electronic health records and workflow fragmentation on clinician documentation burden in emergency departments. Journal of the American Medical Informatics Association.
- Burden of serious harms from diagnostic error in the USA. BMJ Quality and Safety.
- Patterns in Physician Burnout in a Stable-Linked Cohort. JAMA Network Open.
- Pichai, S. 2023. Introducing Gemini: our largest and most capable AI model. https://blog.google/technology/ai/google-gemini-ai/.
- Revisiting the Time Needed to Provide Adult Primary Care. Society of General Internal Medicine.
- Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Annals of Internal Medicine.
- Distill and Replay for Continual Language Learning. In Proceedings of the 28th International Conference on Computational Linguistics.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL].
- The shaky foundations of large language models and foundation models for electronic health records. NPJ Digital Medicine.
- Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685 [cs.CL].
- Ye Chen (52 papers)
- Igor Couto (1 paper)
- Wei Cai (130 papers)
- Cong Fu (24 papers)
- Bruno Dorneles (1 paper)