TopP–S: Persistent homology‐based multi‐task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility

Aqueous solubility and partition coefficient are important physical properties of small molecules. Accurate theoretical prediction of aqueous solubility and partition coefficient plays an important role in drug design and discovery. The prediction accuracy depends crucially on molecular descriptors...

Full description

Saved in:
Bibliographic Details
Published inJournal of computational chemistry Vol. 39; no. 20; pp. 1444 - 1454
Main Authors Wu, Kedi, Zhao, Zhixiong, Wang, Renxiao, Wei, Guo‐Wei
Format Journal Article
LanguageEnglish
Published United States Wiley Subscription Services, Inc 30.07.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Aqueous solubility and partition coefficient are important physical properties of small molecules. Accurate theoretical prediction of aqueous solubility and partition coefficient plays an important role in drug design and discovery. The prediction accuracy depends crucially on molecular descriptors which are typically derived from a theoretical understanding of the chemistry and physics of small molecules. This work introduces an algebraic topology‐based method, called element‐specific persistent homology (ESPH), as a new representation of small molecules that is entirely different from conventional chemical and/or physical representations. ESPH describes molecular properties in terms of multiscale and multicomponent topological invariants. Such topological representation is systematical, comprehensive, and scalable with respect to molecular size and composition variations. However, it cannot be literally translated into a physical interpretation. Fortunately, it is readily suitable for machine learning methods, rendering topological learning algorithms. Due to the inherent correlation between solubility and partition coefficient, a uniform ESPH representation is developed for both properties, which facilitates multi‐task deep neural networks for their simultaneous predictions. This strategy leads to a more accurate prediction of relatively small datasets. A total of six datasets is considered in this work to validate the proposed topological and multitask deep learning approaches. It is demonstrated that the proposed approaches achieve some of the most accurate predictions of aqueous solubility and partition coefficient. Our software is available online at http://weilab.math.msu.edu/TopP-S/. © 2018 Wiley Periodicals, Inc. Accurate and efficient predictions of partition coefficient and aqueous solubility are essential to computer‐aided drug design and discovery. This work integrates algebraic topology and multitask deep learning strategies for these predictions. Algebraic topology is designed to characterize non‐covalent interactions, while multitask deep learning is tailored to transfer characteristics learned from large datasets to small ones to improve their prediction accuracy. Numerical experiments on benchmark datasets demonstrate the success of the proposed approach.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0192-8651
1096-987X
1096-987X
DOI:10.1002/jcc.25213