Systematic Evaluation of Local and Global Machine Learning Models for the Prediction of ADME Properties

Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metabolism, and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on molecular structures and corresponding ADME assay data to develop quantitative structure–property relations...

Full description

Saved in:

Bibliographic Details
Published in	Molecular pharmaceutics Vol. 20; no. 3; pp. 1758 - 1767
Main Authors	Di Lascio, Elena, Gerebtzoff, Grégori, Rodríguez-Pérez, Raquel
Format	Journal Article
Language	English
Published	United States American Chemical Society 06.03.2023
Subjects	Algorithms Drug Discovery - methods Machine Learning Molecular Structure Pharmaceutical Preparations Pharmacokinetics Quantitative Structure-Activity Relationship predictive models ADME Machine learning global models pharmacokinetics local models medicinal chemistry data shift
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metabolism, and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on molecular structures and corresponding ADME assay data to develop quantitative structure–property relationship (QSPR) models. Traditional QSPR models were trained on compound sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compounds, namely, compounds designed for the same drug discovery project or chemical series (local model approach) or with a larger set of diverse compounds (global model approach). Global models are built with all experimental data available for an assay, combining compound data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data composition for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different experimental assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve ML-based ADME properties predictions and thus decision-making in drug discovery projects.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1543-8384 1543-8392
DOI:	10.1021/acs.molpharmaceut.2c00962