Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy

As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular pro...

Full description

Saved in:

Bibliographic Details
Published in	Molecular therapy Vol. 30; no. 8; pp. 2856 - 2867
Main Authors	Hasan, Md Mehedi, Tsukiyama, Sho, Cho, Jae Youl, Kurata, Hiroyuki, Alam, Md Ashad, Liu, Xiaowen, Manavalan, Balachandran, Deng, Hong-Wen
Format	Journal Article
Language	English
Published	United States Elsevier Inc 03.08.2022 American Society of Gene & Cell Therapy
Subjects	Algorithms baseline models bioinformatics Computational Biology - methods Deep Learning epigenetic regulation Humans Machine Learning Original prediction model RNA - genetics RNA N5-methylcytosine sequence analysis stacking framework systematic evaluation deep learning RNA N5-methylcytosine prediction model baseline models bioinformatics sequence analysis machine learning epigenetic regulation systematic evaluation stacking framework
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis. [Display omitted] To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word embedding approaches. A stacking strategy is effectively utilized by integrating the predicted output of these baseline models and trained with a 1D convolutional neural network. Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1525-0016 1525-0024 1525-0024
DOI:	10.1016/j.ymthe.2022.05.001