Fine-tuning pre-trained extractive QA models for clinical document parsing (2312.02314v1)

Published 4 Dec 2023 in cs.CL and cs.AI

Abstract: Electronic health records (EHRs) contain a vast amount of high-dimensional multi-modal data that can accurately represent a patient's medical history. Unfortunately, most of this data is either unstructured or semi-structured, rendering it unsuitable for real-time and retrospective analyses. A remote patient monitoring (RPM) program for Heart Failure (HF) patients needs to have access to clinical markers like EF (Ejection Fraction) or LVEF (Left Ventricular Ejection Fraction) in order to ascertain eligibility and appropriateness for the program. This paper explains a system that can parse echocardiogram reports and verify EF values. This system helps identify eligible HF patients who can be enrolled in such a program. At the heart of this system is a pre-trained extractive QA transformer model that is fine-tuned on custom-labeled data. The methods used to prepare such a model for deployment are illustrated by running experiments on a public clinical dataset like MIMIC-IV-Note. The pipeline can be used to generalize solutions to similar problems in a low-resource setting. We found that the system saved over 1500 hours for our clinicians over 12 months by automating the task at scale.

Summary

The paper demonstrates that fine-tuning a pre-trained extractive QA model significantly improves extraction of ejection fraction values from clinical echocardiogram reports.
It employs OCR and PHI redaction to convert unstructured clinical documents into processable text, ensuring patient privacy.
The study shows a notable increase in EM accuracy and F1 score, reducing clinician screening time by over 1500 hours annually.

Automating Clinical Document Parsing with Pre-Trained Extractive QA Models

The manual review of clinical documents, such as echocardiogram reports, is a critical yet time-consuming task for clinicians. To hasten this process, particularly in identifying heart failure (HF) patients for remote patient monitoring (RPM) programs, researchers have developed a system to automate the parsing of clinical documents. Central to this innovation is the use of a pre-trained extractive Question Answering (QA) model, which locates specific information—from this case, the ejection fraction (EF) values from echocardiogram reports—a key metric in heart failure diagnosis and management.

How the System Works

Echocardiogram reports provide essential data for diagnosing and managing heart failure but are often formatted as unstructured or semi-structured PDFs, which complicates data extraction. The presented system employs Optical Character Recognition (OCR) to convert these reports into text, follows with personal health information (PHI) redaction to ensure privacy, and then leverages a pre-trained extractive QA model to pinpoint and verify EF values within the text. This model, originally trained on general datasets, has been specifically fine-tuned on curated clinical documents to refine its performance within the medical domain.

By implementing this new system, the researchers argue that the identification of eligible HF patients can be substantially swift, significantly reducing screening time for clinicians. In practical terms, the system has reportedly saved over 1500 clinician hours in a year by automating EF value extraction at scale.

The Experiment and Its Findings

The system's efficacy was demonstrated using a public clinical dataset called MIMIC-IV-Note. The extractive QA model at the heart of the system was fine-tuned with custom-labeled echocardiogram report data to adapt its capabilities to the relevant domain. Although the real dataset for the heart failure RPM program is not publicly shared for confidentiality reasons, the researchers used MIMIC-IV-Note as a stand-in to simulate and verify their methods. The experiments showed a notable increase in performance metrics such as Exact Match (EM) accuracy and F1 score for locating EF values accurately within the text after fine-tuning.

Interestingly, fine-tuning not only improved the model's accuracy in extracting EF values but also reduced its prompt sensitivity—meaning that it became more robust and less dependent on the precise wording of questions. This is particularly valuable in clinical settings, where varying terms and phrasing can otherwise lead to inconsistencies in information extraction.

The Road Ahead

This paper elegantly illustrates the potential of applying natural language processing to streamline clinical workflows. By sharing the underlying principles and methods, the research encourages further adaptation and application of AI-driven systems across various low-resource settings, potentially unlocking the efficiency of numerous medical data analysis tasks.

However, the researchers acknowledge limitations, such as the dependency on OCR accuracy and the exclusion of private health information. Nonetheless, the work sets a foundation for similar approaches and establishes the verified utility of AI in helping healthcare professionals focus more on patient care rather than administrative tasks.

PDF Markdown