NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations

As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast numb...

Full description

Saved in:

Bibliographic Details
Published in	Genomics, proteomics & bioinformatics Vol. 21; no. 2; pp. 349 - 358
Main Authors	Wang, Shaojun, You, Ronghui, Liu, Yunjia, Xiong, Yi, Zhu, Shanfeng
Format	Journal Article
Language	English
Published	China Elsevier B.V 01.04.2023 Elsevier
Subjects	Large-scale multi-label learning Learning to rank Protein function prediction Protein language model Web Server Web service Learning to rank Large-scale multi-label learning Protein language model Protein function prediction Web service Protein language models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations [e.g., Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at https://dmiip.sjtu.edu.cn/ng3.0.
ISSN:	1672-0229 2210-3244
DOI:	10.1016/j.gpb.2023.04.001