NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations

As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast numb...

Full description

Saved in:
Bibliographic Details
Published inGenomics, proteomics & bioinformatics Vol. 21; no. 2; pp. 349 - 358
Main Authors Wang, Shaojun, You, Ronghui, Liu, Yunjia, Xiong, Yi, Zhu, Shanfeng
Format Journal Article
LanguageEnglish
Published China Elsevier B.V 01.04.2023
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations [e.g., Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at https://dmiip.sjtu.edu.cn/ng3.0.
ISSN:1672-0229
2210-3244
DOI:10.1016/j.gpb.2023.04.001