Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 159 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 352 tok/s Pro
Claude Sonnet 4.5 33 tok/s Pro
2000 character limit reached

LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications (2404.16294v1)

Published 25 Apr 2024 in cs.CL and cs.AI

Abstract: Electronic health records (EHR) even though a boon for healthcare practitioners, are growing convoluted and longer every day. Sifting around these lengthy EHRs is taxing and becomes a cumbersome part of physician-patient interaction. Several approaches have been proposed to help alleviate this prevalent issue either via summarization or sectioning, however, only a few approaches have truly been helpful in the past. With the rise of automated methods, ML has shown promise in solving the task of identifying relevant sections in EHR. However, most ML methods rely on labeled data which is difficult to get in healthcare. LLMs on the other hand, have performed impressive feats in NLP, that too in a zero-shot manner, i.e. without any labeled data. To that end, we propose using LLMs to identify relevant section headers. We find that GPT-4 can effectively solve the task on both zero and few-shot settings as well as segment dramatically better than state-of-the-art methods. Additionally, we also annotate a much harder real world dataset and find that GPT-4 struggles to perform well, alluding to further research and harder benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Dhananjay Ashok and Zachary C Lipton. 2023. Promptner: Prompting for named entity recognition. arXiv preprint arXiv:2305.15444.
  2. Inspire the large language model by external knowledge on biomedical named entity recognition.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  4. CCSI. 2022. Six healthcare workflows primed for cloud faxing. https://healthitsecurity.com/news/six-healthcare-workflows-primed-for-cloud-faxing. Accessed: 2023-12-15.
  5. US Congress. 2009. Hr 1: American recovery and reinvestment act of 2009. Washington, DC (February 2009).
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. AMIA Annu Symp Proc, 2021:438–447.
  8. Hierarchical annotation for building a suite of clinical natural language processing tasks: Progress note understanding. In LREC… International Conference on Language Resources & Evaluation:[proceedings]. International Conference on Language Resources & Evaluation, volume 2022, page 5484. NIH Public Access.
  9. William R Hersh and Robert E Hoyt. 2018. Health Informatics: Practical Guide Seventh Edition. Lulu. com.
  10. Identifying sections in scientific abstracts using conditional random fields. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I.
  11. Mistral 7b.
  12. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9.
  13. A new public corpus for clinical section identification: Medsecid. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3709–3721.
  14. “note bloat” impacts deep learning-based nlp models for clinical prediction tasks. Journal of Biomedical Informatics, 133:104149.
  15. Deid-gpt: Zero-shot medical text de-identification by gpt-4.
  16. Larry McKnight and Padmini Srinivasan. 2003. Categorization of sentence types in medical abstracts. In AMIA 2003, American Medical Informatics Association Annual Symposium, Washington, DC, USA, November 8-12, 2003. AMIA.
  17. Clinical note section identification using transfer learning. In Proceedings of Sixth International Congress on Information and Communication Technology: ICICT 2021, London, Volume 1, pages 533–542. Springer.
  18. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375.
  19. OpenAI. 2023. Gpt-4 technical report.
  20. SOAP Notes. StatPearls Publishing.
  21. Current approaches to identify sections within clinical narratives from electronic health records: a systematic review. BMC Medical Research Methodology, 19.
  22. Lance A Ramshaw and Mitchell P Marcus. 1999. Text chunking using transformation-based learning. In Natural language processing using very large corpora, pages 157–176. Springer.
  23. Length and Redundancy of Outpatient Progress Notes Across a Decade at an Academic Medical Center. JAMA Network Open, 4(7):e2115334–e2115334.
  24. Statistical section segmentation in free-text clinical records. In Lrec, pages 2001–2008.
  25. Llama 2: Open foundation and fine-tuned chat models.
  26. 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5):552–556.
  27. Gpt-ner: Named entity recognition via large language models. arXiv preprint arXiv:2304.10428.
  28. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
  29. Improving model transferability for clinical note section classification models using continued pretraining. medRxiv.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 2 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube