Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the co...

Full description

Saved in:

Bibliographic Details
Published in	PLoS computational biology Vol. 14; no. 1; p. e1005929
Main Authors	Cang, Zixuan, Mu, Lin, Wei, Guo-Wei
Format	Journal Article
Language	English
Published	United States Public Library of Science 08.01.2018 Public Library of Science (PLoS)
Subjects	Accuracy Algebra Algorithms Area Under Curve Artificial intelligence Artificial neural networks Binding Biochemistry & Molecular Biology Bioinformatics Biological research Biology Biology and Life Sciences Biomolecules Computational biology Computational Biology - methods Computational chemistry Computer and Information Sciences Coordination compounds Databases, Protein Datasets Electrostatic properties Electrostatics Funding Homology Humans Induction algorithms Learning algorithms Ligands Machine Learning Mathematical & Computational Biology Mathematics Methods Models, Neurological Molecular Dynamics Simulation Molecular interactions Neural networks Neural Networks, Computer Nucleic Acids - chemistry Physical Sciences Physics Predictions Protein Binding Protein Interaction Mapping Proteins Proteins - chemistry Research and Analysis Methods Screening Static Electricity Topology United States > US East Lansing Michigan Michigan
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.
Bibliography:	new_version ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Michigan State University (MSU) USDOE Office of Science (SC) AC05-00OR22725; IIS-1302285; DMS-1721024 National Science Foundation (NSF) The authors have declared that no competing interests exist.
ISSN:	1553-7358 1553-734X 1553-7358
DOI:	10.1371/journal.pcbi.1005929