Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups

Abstract Objective Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. Materials and M...

Full description

Saved in:
Bibliographic Details
Published inJournal of the American Medical Informatics Association : JAMIA Vol. 28; no. 12; pp. 2641 - 2653
Main Authors Huang, Yufang, Liu, Yifan, Steel, Peter A D, Axsom, Kelly M, Lee, John R, Tummalapalli, Sri Lekha, Wang, Fei, Pathak, Jyotishman, Subramanian, Lakshminarayanan, Zhang, Yiye
Format Journal Article
LanguageEnglish
Published England Oxford University Press 25.11.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Objective Deep significance clustering (DICE) is a self-supervised learning framework. DICE identifies clinically similar and risk-stratified subgroups that neither unsupervised clustering algorithms nor supervised risk prediction algorithms alone are guaranteed to generate. Materials and Methods Enabled by an optimization process that enforces statistical significance between the outcome and subgroup membership, DICE jointly trains 3 components, representation learning, clustering, and outcome prediction while providing interpretability to the deep representations. DICE also allows unseen patients to be predicted into trained subgroups for population-level risk stratification. We evaluated DICE using electronic health record datasets derived from 2 urban hospitals. Outcomes and patient cohorts used include discharge disposition to home among heart failure (HF) patients and acute kidney injury among COVID-19 (Cov-AKI) patients, respectively. Results Compared to baseline approaches including principal component analysis, DICE demonstrated superior performance in the cluster purity metrics: Silhouette score (0.48 for HF, 0.51 for Cov-AKI), Calinski-Harabasz index (212 for HF, 254 for Cov-AKI), and Davies-Bouldin index (0.86 for HF, 0.66 for Cov-AKI), and prediction metric: area under the Receiver operating characteristic (ROC) curve (0.83 for HF, 0.78 for Cov-AKI). Clinical evaluation of DICE-generated subgroups revealed more meaningful distributions of member characteristics across subgroups, and higher risk ratios between subgroups. Furthermore, DICE-generated subgroup membership alone was moderately predictive of outcomes. Discussion DICE addresses a gap in current machine learning approaches where predicted risk may not lead directly to actionable clinical steps. Conclusion DICE demonstrated the potential to apply in heterogeneous populations, where having the same quantitative risk does not equate with having a similar clinical profile.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1527-974X
1067-5027
1527-974X
DOI:10.1093/jamia/ocab203