- The paper demonstrates that prompt-based learning models achieve high F scores, outperforming traditional methods in cohort selection.
- It employs extractive summarization and SNOMED CT-driven prompt engineering to classify EHRs based on clinical trial eligibility criteria.
- Results indicate improved precision, recall, and overall efficiency in patient recruitment, highlighting robust trial matching capabilities.
Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model
Introduction
The paper investigates the application of a prompt-based learning model, particularly Generative Pre-trained Transformer (GPT), to improve patient recruitment for clinical trials by leveraging Electronic Health Records (EHR). Given the challenges posed by unstructured medical texts and the inefficiency in cohort selection, the paper proposes a novel approach employing NLP techniques focused on LLMs and their ability to process and classify medical records according to eligibility criteria. The primary aim is to determine whether prompt-based learning can compete effectively with existing methods for cohort selection.
Methodology
Data and Framework
The paper utilized the dataset from the 2018 n2c2 challenge, which consists of 311 medical records labeled across 13 eligibility criteria. The approach involved selecting the most relevant sentences from the records in relation to trial criteria, annotating these sentences using MedCAT with SNOMED CT concepts, followed by prompt-based training of a GPT model. The framework involved extractive summarization, knowledge graph generation using SNOMED CT concepts, and prompt engineering to create input questions for the GPT model. The methodology featured two scenarios: one without summarization and one utilizing a SNOMED CT-based summarization method, assessing the GPT-3.5 Turbo model's performance using precision, recall, specificity, and F scores.
Extractive Summarization and Prompt Engineering
Extractive summarization was achieved by identifying key concepts using SNOMED CT codes extracted through MedCAT. Prompt engineering involved converting criteria definitions into specific prompts, which guided the model's responses in a binary "yes" or "no" format. This facilitated direct mapping of model outputs to cohort selection decisions, effectively allowing the model to handle temporal reasoning and inference.
Results
The paper's experimental results indicate a substantial improvement in cohort selection metrics by incorporating the summarization method. The prompt-based model reported micro and macro F scores of 0.9061 and 0.8060, respectively, with exceptional performance in several criteria such as ALCOHOL-ABUSE and CREATININE, surpassing other machine learning approaches on the dataset. It demonstrated the highest F scores in five out of thirteen criteria compared to existing ML-based methods, showcasing robust performance in text classification with complex medical eligibility criteria.
Discussion
Key limitations highlighted include challenges in handling abbreviations and temporal data parsing, affecting certain criteria such as DIETSUPP-2MOS and MI-6MOS. The paper also pointed out potential improvements by expanding SNOMED CT concepts for more nuanced clinical terms. The researchers propose that GPT models, due to their extensive pre-training across varied text sources, can better accommodate the diverse and complex nature of medical notes, potentially enhancing cohort selection in clinical trials when larger, more diverse datasets are employed.
The application of this model promises efficiency by reducing manual review requirements, easing preprocessing tasks, and streamlining trial matching processes. Future advancements might focus on refining prompt designs, improving concept extraction, and expanding upon SNOMED CT ontology integration.
Conclusion
The paper establishes that prompt-based learning models, specifically utilizing GPT, exhibit formidable capabilities in cohort selection for clinical trials, particularly when augmented with SNOMED CT-driven summarization. Such models could significantly improve trial recruitment processes by automating and scaling patient matching tasks, notably when leveraging larger datasets. Future work could further refine these methodologies to bolster cohort selection efficacy and reliability across diverse medical data scenarios.