Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 440 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Medical Data Augmentation via ChatGPT: A Case Study on Medication Identification and Medication Event Classification (2306.07297v1)

Published 10 Jun 2023 in cs.CL, cs.AI, and cs.LG

Abstract: The identification of key factors such as medications, diseases, and relationships within electronic health records and clinical notes has a wide range of applications in the clinical field. In the N2C2 2022 competitions, various tasks were presented to promote the identification of key factors in electronic health records (EHRs) using the Contextualized Medication Event Dataset (CMED). Pretrained LLMs demonstrated exceptional performance in these tasks. This study aims to explore the utilization of LLMs, specifically ChatGPT, for data augmentation to overcome the limited availability of annotated data for identifying the key factors in EHRs. Additionally, different pre-trained BERT models, initially trained on extensive datasets like Wikipedia and MIMIC, were employed to develop models for identifying these key variables in EHRs through fine-tuning on augmented datasets. The experimental results of two EHR analysis tasks, namely medication identification and medication event classification, indicate that data augmentation based on ChatGPT proves beneficial in improving performance for both medication identification and medication event classification.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Publicly Available Clinical BERT Embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics, Minneapolis, Minnesota, USA, 72–78. https://doi.org/10.18653/v1/W19-1909
  2. COMETA: A Corpus for Medical Entity Linking in the Social Media. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  3. Language Models are Few-Shot Learners. arXiv:cs.CL/2005.14165
  4. AugGPT: Leveraging ChatGPT for Text Data Augmentation. arXiv:cs.CL/2302.13007
  5. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  6. Deep Learning. MIT Press, Cambridge, MA, USA. urlhttp://www.deeplearningbook.org.
  7. Bidirectional LSTM-CRF Models for Sequence Tagging. ArXiv abs/1508.01991 (2015).
  8. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (sep 2019). https://doi.org/10.1093/bioinformatics/btz682
  9. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/ARXIV.1907.11692
  10. Toward Understanding Clinical Context of Medication Change Events in Clinical Narratives. AMIA … Annual Symposium proceedings. AMIA Symposium 2021 (2021), 833–842.
  11. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227–2237. https://doi.org/10.18653/v1/N18-1202
  12. Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training.
  13. An Overview of Named Entity Recognition. In 2018 International Conference on Asian Language Processing (IALP). 273–278. https://doi.org/10.1109/IALP.2018.8629225
  14. Vincent Van Asch. 2013. Macro-and micro-averaged evaluation measures [[basic draft]]. Belgium: CLiPS 49 (2013).
  15. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:cs.SE/2302.11382
  16. Yiming Yang. 2001. A study of thresholding strategies for text categorization. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. 137–145.
  17. GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 2225–2239. https://doi.org/10.18653/v1/2021.findings-emnlp.192
  18. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
Citations (9)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.