Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 33 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 220 tok/s Pro
GPT OSS 120B 465 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Backdoor Adjustment of Confounding by Provenance for Robust Text Classification of Multi-institutional Clinical Notes (2310.02451v1)

Published 3 Oct 2023 in cs.CL

Abstract: NLP methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and integration across different institutions for accurate and portable models. However, this can introduce a form of bias called confounding by provenance. When source-specific data distributions differ at deployment, this may harm model performance. To address this issue, we evaluate the utility of backdoor adjustment for text classification in a multi-site dataset of clinical notes annotated for mentions of substance abuse. Using an evaluation framework devised to measure robustness to distributional shifts, we assess the utility of backdoor adjustment. Our results indicate that backdoor adjustment can effectively mitigate for confounding shift.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Percha B. Modern clinical text mining: a guide and review. Annual review of biomedical data science. 2021 Jul 20;4:165-87.
  2. Guo Y, Li C, Roan C, Pakhomov S, Cohen T. Crossing the “Cookie Theft” corpus chasm: applying what BERT learns from outside data to the ADReSS challenge dementia detection task. Frontiers in Computer Science. 2021 Apr 16;3:642517.
  3. Landeiro V, Culotta A. Robust text classification under confounding shift. Journal of Artificial Intelligence Research. 2018 Nov 5;63:391-419.
  4. Littlestone N. From on-line to batch learning. InProceedings of the second annual workshop on Computational learning theory 2014 Jun 28 (pp. 269-284).
  5. Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. 2019 Aug 27.
  6. Pearl J. Causal inference in the health sciences: a conceptual introduction. Health services and outcomes research methodology. 2001 Dec;2:189-220.
  7. Pearl J. Causality. Cambridge university press; 2009 Sep 14.
  8. Kazancioğlu R. Risk factors for chronic kidney disease: an update. Kidney international supplements. 2013 Dec 1;3(4):368-71.
  9. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995 Sep;20:273-97.
  10. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 2016 Aug 13 (pp. 785-794).
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.