Deep learning program to predict protein functions based on sequence information

•A new deep learning program to predict protein functions in silico.•Requirement of nothing more than the protein sequence information.•A sequence segmentation to improve the efficiency of prediction.•Prediction of the clinical impact of mutations or polymorphisms. Deep learning technologies have be...

Full description

Saved in:
Bibliographic Details
Published inMethodsX Vol. 9; p. 101622
Main Authors Ko, Chang Woo, Huh, June, Park, Jong-Wan
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.01.2022
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A new deep learning program to predict protein functions in silico.•Requirement of nothing more than the protein sequence information.•A sequence segmentation to improve the efficiency of prediction.•Prediction of the clinical impact of mutations or polymorphisms. Deep learning technologies have been adopted to predict the functions of newly identified proteins in silico. However, most current models are not suitable for poorly characterized proteins because they require diverse information on target proteins. We designed a binary classification deep learning program requiring only sequence information. This program was named ‘FUTUSA’ (function teller using sequence alone). It applied sequence segmentation during the sequence feature extraction process, by a convolution neural network, to train the regional sequence patterns and their relationship. This segmentation process improved the predictive performance by 49% than the full-length process. Compared with a baseline method, our approach achieved higher performance in predicting oxidoreductase activity. In addition, FUTUSA also showed dramatic performance in predicting acetyltransferase and demethylase activities. Next, we tested the possibility that FUTUSA can predict the functional consequence of point mutation. After trained for monooxygenase activity, FUTUSA successfully predicted the impact of point mutations on phenylalanine hydroxylase, which is responsible for an inherited metabolic disease PKU. This deep-learning program can be used as the first-step tool for characterizing newly identified or poorly studied proteins.•We proposed new deep learning program to predict protein functions in silico that requires nothing more than the protein sequence information.•Due to application of sequence segmentation, the efficiency of prediction is improved.•This method makes prediction of the clinical impact of mutations or polymorphisms possible. [Display omitted]
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2215-0161
2215-0161
DOI:10.1016/j.mex.2022.101622