An efficient strategy for identifying essential proteins based on homology, subcellular location and protein-protein interaction information

High throughput biological experiments are expensive and time consuming. For the past few years, many computational methods based on biological information have been proposed and widely used to understand the biological background. However, the processing of biological information data inevitably pr...

Full description

Saved in:
Bibliographic Details
Published inMathematical biosciences and engineering : MBE Vol. 19; no. 6; pp. 6331 - 6343
Main Authors Zhang, Zhihong, Luo, Yingchun, Jiang, Meiping, Wu, Dongjie, Zhang, Wang, Yan, Wei, Zhao, Bihai
Format Journal Article
LanguageEnglish
Published United States AIMS Press 01.01.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:High throughput biological experiments are expensive and time consuming. For the past few years, many computational methods based on biological information have been proposed and widely used to understand the biological background. However, the processing of biological information data inevitably produces false positive and false negative data, such as the noise in the Protein-Protein Interaction (PPI) networks and the noise generated by the integration of a variety of biological information. How to solve these noise problems is the key role in essential protein predictions. An Identifying Essential Proteins model based on non-negative Matrix Symmetric tri-Factorization and multiple biological information (IEPMSF) is proposed in this paper, which utilizes only the PPI network proteins common neighbor characters to develop a weighted network, and uses the non-negative matrix symmetric tri-factorization method to find more potential interactions between proteins in the network so as to optimize the weighted network. Then, using the subcellular location and lineal homology information, the starting score of proteins is determined, and the random walk algorithm with restart mode is applied to the optimized network to mark and rank each protein. We tested the suggested forecasting model against current representative approaches using a public database. Experiment shows high efficiency of new method in essential proteins identification. The effectiveness of this method shows that it can dramatically solve the noise problems that existing in the multi-source biological information itself and cased by integrating them.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1551-0018
1551-0018
DOI:10.3934/mbe.2022296