Benchmarking deep learning models on large healthcare datasets

[Display omitted] •Exhaustive benchmarking evaluation of deep learning models on MIMIC-III dataset.•Mortality, Length of stay, and ICD-9 code prediction tasks are used for evaluation.•Deep learning models achieve the best performance compared to all existing models.•Deep learning models perform well...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biomedical informatics Vol. 83; pp. 112 - 134
Main Authors	Purushotham, Sanjay, Meng, Chuizheng, Che, Zhengping, Liu, Yan
Format	Journal Article
Language	English
Published	United States Elsevier Inc 01.07.2018
Subjects	Deep learning models ICD-9 code group prediction Length of stay Mortality prediction Super learner algorithm Super learner algorithm Deep learning models Mortality prediction ICD-9 code group prediction Length of stay
Online Access	Get full text

Cover

Loading…

More Information
Summary:	[Display omitted] •Exhaustive benchmarking evaluation of deep learning models on MIMIC-III dataset.•Mortality, Length of stay, and ICD-9 code prediction tasks are used for evaluation.•Deep learning models achieve the best performance compared to all existing models.•Deep learning models perform well with raw clinical time series features. Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the ‘raw’ clinical time series data is used as input features to the models.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1532-0464 1532-0480 1532-0480
DOI:	10.1016/j.jbi.2018.04.007