Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 82 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

A Deep Representation Empowered Distant Supervision Paradigm for Clinical Information Extraction (1804.07814v1)

Published 20 Apr 2018 in cs.IR

Abstract: Objective: To automatically create large labeled training datasets and reduce the efforts of feature engineering for training accurate machine learning models for clinical information extraction. Materials and Methods: We propose a distant supervision paradigm empowered by deep representation for extracting information from clinical text. In this paradigm, the rule-based NLP algorithms are utilized to generate weak labels and create large training datasets automatically. Additionally, we use pre-trained word embeddings as deep representation to eliminate the need of task-specific feature engineering for machine learning. We evaluated the effectiveness of the proposed paradigm on two clinical information extraction tasks: smoking status extraction and proximal femur (hip) fracture extraction. We tested three prevalent machine learning models, namely, Convolutional Neural Networks (CNN), Support Vector Machine (SVM), and Random Forrest (RF). Results: The results indicate that CNN is the best fit to the proposed distant supervision paradigm. It outperforms the rule-based NLP algorithms given large datasets by capturing additional extraction patterns. We also verified the advantage of word embedding feature representation in the paradigm over term frequency-inverse document frequency (tf-idf) and topic modeling representations. Discussion: In the clinical domain, the limited amount of labeled data is always a bottleneck for applying machine learning. Additionally, the performance of machine learning approaches highly depends on task-specific feature engineering. The proposed paradigm could alleviate those problems by leveraging rule-based NLP algorithms to automatically assign weak labels and eliminating the need of task-specific feature engineering using word embedding feature representation.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube