Causal Forests for Discovering Diagnostic Language in Electronic Health Records
Textual analysis has gained significant interest in medical research, particularly for automated patient diagnosis based on clinical narratives. While traditional approaches often focus on associational methods, this paper explores the application of causal forests to analyze textual data from elect...
Saved in:
Published in | Applied stochastic models in business and industry Vol. 41; no. 5 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
01.09.2025
|
Online Access | Get full text |
Cover
Loading…
Summary: | Textual analysis has gained significant interest in medical research, particularly for automated patient diagnosis based on clinical narratives. While traditional approaches often focus on associational methods, this paper explores the application of causal forests to analyze textual data from electronic health records (EHRs), aiming to identify causal relationships between specific words and the likelihood of receiving certain medical diagnoses. Utilizing the MIMIC‐III dataset, we assess how linguistic factors influence diagnosis probabilities for three conditions: diabetes, hypothyroidism, and adrenal gland disorders. Our findings reveal significant causal links between certain clinical terms and diagnosis probabilities, emphasizing the potential of causal inference techniques to improve the analysis of language in clinical narratives. Additionally, we uncover heterogeneity in treatment effects, demonstrating that specific words can identify high‐risk patient subgroups. This study highlights the importance of integrating causal inference in natural language processing within healthcare settings. |
---|---|
ISSN: | 1524-1904 1526-4025 |
DOI: | 10.1002/asmb.70038 |