MISSING DATA IMPUTATION IN THE ELECTRONIC HEALTH RECORD USING DEEPLY LEARNED AUTOENCODERS

Electronic health records (EHRs) have become a vital source of patient outcome data but the widespread prevalence of missing data presents a major challenge. Different causes of missing data in the EHR data may introduce unintentional bias. Here, we compare the effectiveness of popular multiple impu...

Full description

Saved in:

Bibliographic Details
Published in	Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Vol. 22; p. 207
Main Authors	Beaulieu-Jones, Brett K, Moore, Jason H
Format	Journal Article
Language	English
Published	United States 01.01.2017
Subjects	Amyotrophic Lateral Sclerosis - physiopathology Bias Clinical Trials as Topic - statistics & numerical data Computational Biology Databases, Factual - statistics & numerical data Disease Progression Electronic Health Records - statistics & numerical data Humans Neural Networks, Computer
Online Access	Get more information

Cover

Loading…

More Information
Summary:	Electronic health records (EHRs) have become a vital source of patient outcome data but the widespread prevalence of missing data presents a major challenge. Different causes of missing data in the EHR data may introduce unintentional bias. Here, we compare the effectiveness of popular multiple imputation strategies with a deeply learned autoencoder using the Pooled Resource Open-Access ALS Clinical Trials Database (PRO-ACT). To evaluate performance, we examined imputation accuracy for known values simulated to be either missing completely at random or missing not at random. We also compared ALS disease progression prediction across different imputation models. Autoencoders showed strong performance for imputation accuracy and contributed to the strongest disease progression predictor. Finally, we show that despite clinical heterogeneity, ALS disease progression appears homogenous with time from onset being the most important predictor.
ISSN:	2335-6936
DOI:	10.1142/9789813207813_0021