Adapting Deep Learning QSPR Models to Specific Drug Discovery Projects

Medicinal chemistry and drug design efforts can be assisted by machine learning (ML) models that relate the molecular structure to compound properties. Such quantitative structure–property relationship models are generally trained on large data sets that include diverse chemical series (global model...

Full description

Saved in:
Bibliographic Details
Published inMolecular pharmaceutics Vol. 21; no. 4; pp. 1817 - 1826
Main Authors Fluetsch, Andrin, Di Lascio, Elena, Gerebtzoff, Grégori, Rodríguez-Pérez, Raquel
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 01.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Medicinal chemistry and drug design efforts can be assisted by machine learning (ML) models that relate the molecular structure to compound properties. Such quantitative structure–property relationship models are generally trained on large data sets that include diverse chemical series (global models). In the pharmaceutical industry, these ML global models are available across discovery projects as an “out-of-the-box” solution to assist in drug design, synthesis prioritization, and experiment selection. However, drug discovery projects typically focus on confined parts of the chemical space (e.g., chemical series), where global models might not be applicable. Local ML models are sometimes generated to focus on specific projects or series. Herein, ML-based global models, local models, and hybrid global-local strategies were benchmarked. Analyses were done for more than 300 drug discovery projects at Novartis and ten absorption, distribution, metabolism, and excretion (ADME) assays. In this work, hybrid global-local strategies based on transfer learning approaches were proposed to leverage both historical ADME data (global) and project-specific data (local) to adapt model predictions. Fine-tuning a pretrained global ML model (used for weights’ initialization, WI) was the top-performing method. Average improvements of mean absolute errors across all assays were 16% and 27% compared with global and local models, respectively. Interestingly, when the effect of training set size was analyzed, WI fine-tuning was found to be successful even in low-data scenarios (e.g., ∼10 molecules per project). Taken together, this work highlights the potential of domain adaptation in the field of molecular property predictions to refine existing pretrained models on a new compound data distribution.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1543-8384
1543-8392
DOI:10.1021/acs.molpharmaceut.3c01124