Accurate prediction of species-specific 2-hydroxyisobutyrylation sites based on machine learning frameworks

Lysine 2-hydroxyisobutyrylation (Khib) is a newly discovered post-translational modification (PTM) across eukaryotes and prokaryotes in recent years, which plays a significant role in diverse cellular functions. Accurate prediction of Khib sites is a first-crucial step to decipher its molecular mech...

Full description

Saved in:
Bibliographic Details
Published inAnalytical biochemistry Vol. 602; p. 113793
Main Authors Wang, You-Gan, Huang, Shu-Yun, Wang, Li-Na, Zhou, Zhi-You, Qiu, Jian-Ding
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.08.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Lysine 2-hydroxyisobutyrylation (Khib) is a newly discovered post-translational modification (PTM) across eukaryotes and prokaryotes in recent years, which plays a significant role in diverse cellular functions. Accurate prediction of Khib sites is a first-crucial step to decipher its molecular mechanism and urgently needed. In this work, based on a large benchmark datasets in multi-species, a novel online species-specific prediction tool, namely KhibPred, was developed to identify Khib sites. Four types of feature strategies, including sequence-based information, physicochemical properties and evolutionary-derived information, were applied to represent a wide range of protein sequences, and the random forest was used to build the optimal feature datasets. Moreover, six representative machine learning (ML) methods were trained and comprehensively discussed and compared for each organism. Data analyses suggested that the unique protein sequence preferences were discovered for each species. When evaluated on independent test datasets, the area under the receiver operating characteristic curves (AUCs) achieved 0.807, 0.781, 0.825 and 0.831 for Saccharomyces cerevisiaes, Physcomitrella patens, Rice Seeds and HeLa cells, respectively. The satisfactory results imply that KhibPred is a promising computational tool. The online predictor can be freely available at: http://bioinfo.ncu.edu.cn/KhibPred.aspx. [Display omitted] •A novel online predictor with satisfying performance is developed to identify 2-hydroxyisobutyrylation sites.•Six representative machine learning methods are trained and systematically compared using an up-to-date training datasets.•Random forest is employed to construct the optimal feature subset.•A user-friendly online web service can be freely accessible to the public: http://bioinfo.ncu.edu.cn/KhibPred.aspx.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0003-2697
1096-0309
DOI:10.1016/j.ab.2020.113793