Protein Family-Specific Models Using Deep Neural Networks and Transfer Learning Improve Virtual Screening and Highlight the Need for More Data

Machine learning has shown enormous potential for computer-aided drug discovery. Here we show how modern convolutional neural networks (CNNs) can be applied to structure-based virtual screening. We have coupled our densely connected CNN (DenseNet) with a transfer learning approach which we use to pr...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemical information and modeling Vol. 58; no. 11; pp. 2319 - 2330
Main Authors Imrie, Fergus, Bradley, Anthony R, van der Schaar, Mihaela, Deane, Charlotte M
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 26.11.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Machine learning has shown enormous potential for computer-aided drug discovery. Here we show how modern convolutional neural networks (CNNs) can be applied to structure-based virtual screening. We have coupled our densely connected CNN (DenseNet) with a transfer learning approach which we use to produce an ensemble of protein family-specific models. We conduct an in-depth empirical study and provide the first guidelines on the minimum requirements for adopting a protein family-specific model. Our method also highlights the need for additional data, even in data-rich protein families. Our approach outperforms recent benchmarks on the DUD-E data set and an independent test set constructed from the ChEMBL database. Using a clustered cross-validation on DUD-E, we achieve an average AUC ROC of 0.92 and a 0.5% ROC enrichment factor of 79. This represents an improvement in early enrichment of over 75% compared to a recent machine learning benchmark. Our results demonstrate that the continued improvements in machine learning architecture for computer vision apply to structure-based virtual screening.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1549-9596
1549-960X
DOI:10.1021/acs.jcim.8b00350