Statistical models for predicting liver toxicity from genomic data

This paper outlines the construction of statistical models for liver pathology in rats and for drug induced liver injury. The envisioned purpose for these models would be to improve the cost of discovering compound toxicity in order to improve the overall cost of drug discovery. The size and breadth...

Full description

Saved in:
Bibliographic Details
Published inSystems biomedicine Vol. 1; no. 2; pp. 144 - 149
Main Authors Bowles, Mike, Shigeta, Ron
Format Journal Article
LanguageEnglish
Published Taylor & Francis 11.04.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper outlines the construction of statistical models for liver pathology in rats and for drug induced liver injury. The envisioned purpose for these models would be to improve the cost of discovering compound toxicity in order to improve the overall cost of drug discovery. The size and breadth of the CAMDA liver toxicity data set presents unique opportunity to test whether statistical toxicity models can serve this purpose. The paper develops models for predicting toxicity from gene expression data. These models purposely exclude physiology and pathology data available in the CAMDA data. Physiology and pathology data require live rats and expensive time-consuming processing that are antithetical to the goal of reducing the time and cost required to determine compound toxicity. Two models are described. One employs Lasso regression and glmnet algorithm to extract models for rat liver pathology. The other employs stochastic gradient boosting to extract models for drug induced liver injury. This paper demonstrates that, given a data set of the size and quality of the CAMDA data, modern machine learning algorithms can extract high quality models-models with sufficient accuracy and specificity to serve the goal of reducing the costs of discovering compound toxicity.
ISSN:2162-8130
2162-8149
DOI:10.4161/sysb.24254