Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance (2402.13448v2)

Published 21 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: In the emergency department (ED), patients undergo triage and multiple laboratory tests before diagnosis. This time-consuming process causes ED crowding which impacts patient mortality, medical errors, staff burnout, etc. This work proposes (time) cost-effective diagnostic assistance that leverages artificial intelligence systems to help ED clinicians make efficient and accurate diagnoses. In collaboration with ED clinicians, we use public patient data to curate MIMIC-ED-Assist, a benchmark for AI systems to suggest laboratory tests that minimize wait time while accurately predicting critical outcomes such as death. With MIMIC-ED-Assist, we develop ED-Copilot which sequentially suggests patient-specific laboratory tests and makes diagnostic predictions. ED-Copilot employs a pre-trained bio-medical LLM to encode patient information and uses reinforcement learning to minimize ED wait time and maximize prediction accuracy. On MIMIC-ED-Assist, ED-Copilot improves prediction accuracy over baselines while halving average wait time from four hours to two hours. ED-Copilot can also effectively personalize treatment recommendations based on patient severity, further highlighting its potential as a diagnostic assistant. Since MIMIC-ED-Assist is a retrospective benchmark, ED-Copilot is restricted to recommend only observed tests. We show ED-Copilot achieves competitive performance without this restriction as the maximum allowed time increases. Our code is available at https://github.com/cxcscmu/ED-Copilot.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Hierarchical shrinkage: Improving the accuracy and interpretability of tree-based models. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  111–135. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/agarwal22b.html.
  2. Mdi+: A flexible random forest-based feature importance framework, 2023.
  3. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama, 318(22):2199–2210, 2017.
  4. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373, 2023.
  5. Breiman, L. Random forests. Machine learning, 45:5–32, 2001.
  6. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp.  785–794, 2016.
  7. DeAnda, R. Stop the bottleneck: Improving patient throughput in the emergency department. Journal of Emergency Nursing, 44(6):582–588, 2018. ISSN 0099-1767. doi: https://doi.org/10.1016/j.jen.2018.05.002. URL https://www.sciencedirect.com/science/article/pii/S0099176717305962.
  8. Multitask learning and benchmarking with clinical time series data. Scientific Data, 6(1), June 2019. ISSN 2052-4463. doi: 10.1038/s41597-019-0103-9. URL http://dx.doi.org/10.1038/s41597-019-0103-9.
  9. Tabllm: Few-shot classification of tabular data with large language models, 2023.
  10. Jarvis, P. R. Improving emergency department patient flow. Clinical and Experimental Emergency Medicine, 3(2):63–68, 2016. doi: 10.15441/ceem.16.127. URL https://doi.org/10.15441/ceem.16.127.
  11. Mimic-iv-ed (version 2.2). PhysioNet, Jan 2023a. URL https://physionet.org/content/mimic-iv-ed/2.2/. Version: 2.2.
  12. Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1, 2023b.
  13. Predicting 30-day mortality of patients with pneumonia in an emergency department setting using machine-learning models. Clinical and Experimental Emergency Medicine, 7(3):197, 2020.
  14. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017.
  15. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nature medicine, 24(11):1716–1720, 2018.
  16. Predictability and stability testing to assess clinical decision instrument performance for children after blunt torso trauma. PLOS Digital Health, 1(8):e0000076, 2022.
  17. Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index. Annals of Emergency Medicine, 71(5):565–574.e2, 2018. ISSN 0196-0644. doi: https://doi.org/10.1016/j.annemergmed.2017.08.005. URL https://www.sciencedirect.com/science/article/pii/S0196064417314427.
  18. The effect of laboratory testing on emergency department length of stay: a multihospital longitudinal study applying a cross-classified random-effect modeling approach. Academic Emergency Medicine, 22(1):38–46, 2015.
  19. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6):bbac409, 2022.
  20. Language models are weak learners, 2023.
  21. Factors contributing to inappropriate ordering of tests in an academic medical department and the effect of an educational feedback strategy. Postgraduate medical journal, 82(974):823–829, 2006.
  22. Benchmarking deep learning models on large healthcare datasets. Journal of Biomedical Informatics, 83:112–134, 2018. ISSN 1532-0464. doi: https://doi.org/10.1016/j.jbi.2018.04.007. URL https://www.sciencedirect.com/science/article/pii/S1532046418300716.
  23. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  24. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
  25. Overcrowding in emergency department: Causes, consequences, and solutions-a narrative review. Healthcare (Basel, Switzerland), 10(9):1625, 2022. doi: 10.3390/healthcare10091625. URL https://doi.org/10.3390/healthcare10091625.
  26. Emergency department overcrowding: Understanding the factors to find corresponding solutions. Journal of Personalized Medicine, 12(2):279, 2022. doi: 10.3390/jpm12020279. URL https://doi.org/10.3390/jpm12020279.
  27. Proximal policy optimization algorithms, 2017.
  28. Towards expert-level medical question answering with large language models, 2023.
  29. Fast interpretable greedy-tree sums (figs). arXiv preprint arXiv:2201.11931, 2022.
  30. Mimic-extract: a data extraction, preprocessing, and representation pipeline for mimic-iii. In Proceedings of the ACM Conference on Health, Inference, and Learning, ACM CHIL ’20. ACM, April 2020. doi: 10.1145/3368555.3384469. URL http://dx.doi.org/10.1145/3368555.3384469.
  31. Predicting progression to septic shock in the emergency department using an externally generalizable machine-learning algorithm. Annals of emergency medicine, 77(4):395–406, 2021.
  32. Benchmarking emergency department prediction models with machine learning and public electronic health records. Scientific Data, 9(1):658, 2022.
  33. Medlm: Exploring language models for medical question answering systems, 2024.
  34. Clinical relation extraction using transformer-based models, 2021.
  35. A large language model for electronic health records. NPJ Digital Medicine, 5(1):194, 2022. doi: 10.1038/s41746-022-00742-2.
  36. Deep reinforcement learning for cost-effective medical diagnosis, 2023.
Citations (2)

Summary

  • The paper introduces ED-Copilot, which uses BioGPT and reinforcement learning to personalize lab test recommendations, reducing ED wait times from 4 hours to 2 hours.
  • ED-Copilot employs the MIMIC-ED-Assist benchmark to validate improved prediction accuracy for critical outcomes like mortality and ICU transfers.
  • ED-Copilot's personalized diagnostic approach tailors laboratory testing to patient-specific data, ensuring timely and accurate diagnoses.

ED-Copilot: Reduce Emergency Department Wait Time with LLM Diagnostic Assistance

Introduction

The paper "ED-Copilot: Reduce Emergency Department Wait Time with LLM Diagnostic Assistance" presents a methodology aimed at addressing the pervasive issue of Emergency Department (ED) crowding by leveraging AI systems. The prevalent challenges in EDs, such as prolonged wait times and the subsequent impacts on patient outcomes, necessitate innovative solutions to enhance throughput and care efficiency. The work introduces the ED-Copilot system, which uses LLMs to provide diagnostic assistance, thereby reducing wait times and improving the accuracy of medical outcomes.

MIMIC-ED-Assist Benchmark

The MIMIC-ED-Assist benchmark is a critical component of this research. It utilizes publicly available patient records from MIMIC-IV to facilitate the paper of cost-effective diagnostic assistance. The benchmark aims to test the effectiveness of AI systems in suggesting laboratory tests and predicting outcomes like patient mortality and ICU transfers. By modeling real-world practices in laboratory test ordering, MIMIC-ED-Assist sets a foundation for evaluating the impact of AI-driven diagnostic suggestions on ED operations.

Methodology: ED-Copilot System

ED-Copilot is designed to sequentially suggest patient-specific laboratory tests and make diagnostic predictions. The system capitalizes on a pre-trained bio-medical LLM, BioGPT, which is refined using patient data to encode information efficiently. Through reinforcement learning (RL), ED-Copilot continuously updates its laboratory test recommendations, aiming to balance the dual objectives of reducing wait time (ED length of stay) and maintaining prediction accuracy for critical outcomes. Figure 1

Figure 1: Overview of ED-Copilot training flow on one ED visit.

Results

The empirical analysis conducted using MIMIC-ED-Assist reveals that ED-Copilot improves prediction accuracy significantly compared to traditional methods while reducing the average laboratory testing time from four hours to two hours. The ablation studies confirm the importance of model scale and specialized training on bio-medical corpora for the LLM's performance. Moreover, personalized laboratory test suggestions are shown as crucial for diagnosing severe cases, highlighting ED-Copilot's capacity to adapt to patient-specific nuances. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Critical Outcome F1.

Figure 3

Figure 3

Figure 3: Accuracy.

Personalized Diagnostic Assistance

One of the standout features of ED-Copilot is its ability to provide personalized diagnostic assistance. Unlike non-personalized methods, ED-Copilot offers tailored laboratory test recommendations based on individual patient data, thereby addressing unique clinical presentations effectively. The paper illustrates that while traditional models may generalize over the entire patient population, ED-Copilot's personalized approach ensures equitable care by highlighting high-risk patients who might otherwise be overlooked. Figure 4

Figure 4: Fraction of patients performing lab groups and predicted by ED-Copilot. On average each patient performed 4.7 groups and cost-effective ED-Copilot suggested 2.4 groups.

Conclusion

ED-Copilot represents a significant advancement in AI-driven healthcare solutions aimed at mitigating ED crowding by enhancing efficiency in laboratory test ordering and improving diagnostic accuracy. By leveraging advanced LLMs and reinforcement learning, the system not only improves patient outcomes but also augments the decision-making process for clinicians. Future research could focus on expanding the capabilities of AI-driven diagnostic systems to broader healthcare contexts, further reducing clinical bottlenecks and optimizing patient care delivery across diverse medical environments.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 9 likes.

Upgrade to Pro to view all of the tweets about this paper: