Gene Expression Data Based Deep Learning Model for Accurate Prediction of Drug-Induced Liver Injury in Advance

Drug-induced liver injury (DILI), one of the most common adverse effects, leads to drug development failure or withdrawal from the market in most cases, showing an emerging challenge that is to accurately predict DILI in the early stage. Recently, the vast amount of gene expression data provides us...

Full description

Saved in:

Bibliographic Details
Published in	Journal of chemical information and modeling Vol. 59; no. 7; pp. 3240 - 3250
Main Authors	Feng, Chunlai, Chen, Hengwei, Yuan, Xianqin, Sun, Mengqiu, Chu, Kexin, Liu, Hanqin, Rui, Mengjie
Format	Journal Article
Language	English
Published	United States American Chemical Society 22.07.2019
Subjects	Algorithms Correlation coefficients Datasets Deep learning Gene expression Liver Machine learning Model accuracy Optimization Prediction models Safety Sensitivity Support vector machines
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Drug-induced liver injury (DILI), one of the most common adverse effects, leads to drug development failure or withdrawal from the market in most cases, showing an emerging challenge that is to accurately predict DILI in the early stage. Recently, the vast amount of gene expression data provides us valuable information for distinguishing DILI on a genomic scale. Moreover, the deep learning algorithm is a powerful strategy to automatically learn important features from raw and noisy data and shows great success in the field of medical diagnosis. In this study, a gene expression data based deep learning model was developed to predict DILI in advance by using gene expression data associated with DILI collected from ArrayExpress and then optimized by feature gene selection and parameters optimization. In addition, the previous machine learning algorithm support vector machine (SVM) was also used to construct another prediction model based on the same data sets, comparing the model performance with the optimal DL model. Finally, the evaluation test using 198 randomly selected samples showed that the optimal DL model achieved 97.1% accuracy, 97.4% sensitivity, 96.8% specificity, 0.942 matthews correlation coefficient, and 0.989 area under the ROC curve, while the performance of SVM model only reached 88.9% accuracy, 78.8% sensitivity, 99.0% specificity, 0.794 matthews correlation coefficient, and 0.901 area under the ROC curve. Furthermore, external data sets verification and animal experiments were conducted to assess the optimal DL model performance. Finally, the predicted results of the optimal DL model were almost consistent with experiment results. These results indicated that our gene expression data based deep learning model could systematically and accurately predict DILI in advance. It could be a useful tool to provide safety information for drug discovery and clinical rational drug use in early stage and become an important part of drug safety assessment.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1549-9596 1549-960X
DOI:	10.1021/acs.jcim.9b00143