Benchmarking deep learning models on large healthcare datasets

[Display omitted] •Exhaustive benchmarking evaluation of deep learning models on MIMIC-III dataset.•Mortality, Length of stay, and ICD-9 code prediction tasks are used for evaluation.•Deep learning models achieve the best performance compared to all existing models.•Deep learning models perform well...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 83; pp. 112 - 134
Main Authors Purushotham, Sanjay, Meng, Chuizheng, Che, Zhengping, Liu, Yan
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.07.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •Exhaustive benchmarking evaluation of deep learning models on MIMIC-III dataset.•Mortality, Length of stay, and ICD-9 code prediction tasks are used for evaluation.•Deep learning models achieve the best performance compared to all existing models.•Deep learning models perform well with raw clinical time series features. Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the ‘raw’ clinical time series data is used as input features to the models.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0464
1532-0480
1532-0480
DOI:10.1016/j.jbi.2018.04.007