Deep learning predicts extreme preterm birth from electronic health records

[Display omitted] •Extreme preterm birth (EPB) accounts for the majority of newborn deaths.•Deep learning models that consider temporal relations can predict EPB.•Deep learning ensemble models achieve a higher performance than individual models.•EPB is associated with significant morbidity, e.g., sy...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 100; p. 103334
Main Authors Gao, Cheng, Osmundson, Sarah, Velez Edwards, Digna R., Jackson, Gretchen Purcell, Malin, Bradley A., Chen, You
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.12.2019
Subjects
Online AccessGet full text
ISSN1532-0464
1532-0480
1532-0480
DOI10.1016/j.jbi.2019.103334

Cover

Loading…
More Information
Summary:[Display omitted] •Extreme preterm birth (EPB) accounts for the majority of newborn deaths.•Deep learning models that consider temporal relations can predict EPB.•Deep learning ensemble models achieve a higher performance than individual models.•EPB is associated with significant morbidity, e.g., systemic lupus erythematosus. Models for predicting preterm birth generally have focused on very preterm (28–32 weeks) and moderate to late preterm (32–37 weeks) settings. However, extreme preterm birth (EPB), before the 28th week of gestational age, accounts for the majority of newborn deaths. We investigated the extent to which deep learning models that consider temporal relations documented in electronic health records (EHRs) can predict EPB. EHR data were subject to word embedding and a temporal deep learning model, in the form of recurrent neural networks (RNNs) to predict EPB. Due to the low prevalence of EPB, the models were trained on datasets where controls were undersampled to balance the case-control ratio. We then applied an ensemble approach to group the trained models to predict EPB in an evaluation setting with a nature EPB ratio. We evaluated the RNN ensemble models with 10 years of EHR data from 25,689 deliveries at Vanderbilt University Medical Center. We compared their performance with traditional machine learning models (logistical regression, support vector machine, gradient boosting) trained on the datasets with balanced and natural EPB ratio. Risk factors associated with EPB were identified using an adjusted odds ratio. The RNN ensemble models trained on artificially balanced data achieved a higher AUC (0.827 vs. 0.744) and sensitivity (0.965 vs. 0.682) than those RNN models trained on the datasets with naturally imbalanced EPB ratio. In addition, the AUC (0.827) and sensitivity (0.965) of the RNN ensemble models were better than the AUC (0.777) and sensitivity (0.819) of the best baseline models trained on balanced data. Also, risk factors, including twin pregnancy, short cervical length, hypertensive disorder, systemic lupus erythematosus, and hydroxychloroquine sulfate, were found to be associated with EPB at a significant level. Temporal deep learning can predict EPB up to 8 weeks earlier than its occurrence. Accurate prediction of EPB may allow healthcare organizations to allocate resources effectively and ensure patients receive appropriate care.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
CG performed the data collection and analysis, method design, experimental design, evaluation and interpretation of the results, and drafting and revising of the manuscript. SO, DE and GJ performed evaluation and interpretation of the results and revising of the manuscript. BM performed method design, experimental design, evaluation and interpretation of the results, and revising of the manuscript. YC performed the data collection and analysis, methods design, experimental design, evaluation and interpretation of the results, and drafting and revising of the manuscript.
Contributors
ISSN:1532-0464
1532-0480
1532-0480
DOI:10.1016/j.jbi.2019.103334