VotePLMs-AFP: Identification of antifreeze proteins using transformer-embedding features and ensemble learning

Antifreeze proteins (AFPs) are a unique class of biomolecules capable of protecting other proteins, cell membranes, and cellular structures within organisms from damage caused by freezing conditions. Given the significance of AFPs in various domains such as biotechnology, agriculture, and medicine,...

Full description

Saved in:
Bibliographic Details
Published inBiochimica et biophysica acta. General subjects Vol. 1868; no. 12; p. 130721
Main Authors Qi, Dawei, Liu, Taigang
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.12.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Antifreeze proteins (AFPs) are a unique class of biomolecules capable of protecting other proteins, cell membranes, and cellular structures within organisms from damage caused by freezing conditions. Given the significance of AFPs in various domains such as biotechnology, agriculture, and medicine, several machine learning methods have been developed to identify AFPs. However, due to the complexity and diversity of AFPs, the predictive performance of existing methods is limited. Therefore, there is an urgent need to develop an efficient and rapid computational method for accurately predicting AFPs. In this study, we proposed a novel predictor based on transformer-embedding features and ensemble learning for the identification of AFPs, termed VotePLMs-AFP. Firstly, three types of feature descriptors were extracted from pre-trained protein language models (PLMs) during the feature extraction process. Subsequently, we analyzed six combinations generated by these three embeddings to explore the optimal feature set, which was input into the soft voting-based ensemble learning classifier for the identification of AFPs. Finally, we evaluated the model on the two benchmark datasets. The experimental results show that our model achieves high prediction accuracy in 10-fold cross-validation (CV) and independent set testing, outperforming existing state-of-the-art methods. Therefore, our model could serve as an effective tool for predicting AFPs. [Display omitted] •Integrate pre-trained PLMs into AFPs identification task.•The ensemble classifier improves the stability and robustness of the model.•Achieved new state-of-the-art performance in the identification of AFPs.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0304-4165
1872-8006
1872-8006
DOI:10.1016/j.bbagen.2024.130721