RFAthM6A: a new tool for predicting m6A sites in Arabidopsis thaliana

Key message We curated a reliable dataset of m 6 A sites in Arabidopsis thaliana , built competitive models for predicting m 6 A sites, extracted predominant rules from the prediction models and analyzed the most important features. In biological RNA, approximately 150 chemical modifications have be...

Full description

Saved in:

Bibliographic Details
Published in	Plant molecular biology Vol. 96; no. 3; pp. 327 - 337
Main Authors	Wang, Xiaofeng, Yan, Renxiang
Format	Journal Article
Language	English
Published	Dordrecht Springer Netherlands 01.02.2018 Springer Nature B.V
Subjects	Alternative splicing Arabidopsis thaliana Benchmarks Biochemistry Biomedical and Life Sciences computational methodology Computer applications data collection Gene expression Life Sciences Localization N6-methyladenosine Nuclear transport physiological transport Plant Pathology Plant Sciences prediction Prediction models Ribonucleic acid RNA RNA transport transcriptome translation (genetics) A Random forest m N methyladenine Prediction
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Key message We curated a reliable dataset of m 6 A sites in Arabidopsis thaliana , built competitive models for predicting m 6 A sites, extracted predominant rules from the prediction models and analyzed the most important features. In biological RNA, approximately 150 chemical modifications have been discovered, of which N 6 -methyladenine (m 6 A) is the most prevalent and abundant. This modification plays an essential role in a myriad of biological mechanisms and regulates RNA localization, nuclear export, translation, stability, alternative splicing, and other processes. However, m 6 A-seq and other wet-lab techniques do not easily facilitate accurate and complete determination of m 6 A sites across the transcriptome. Therefore, the use of computational methods to establish accurate models for predicting m 6 A sites is essential. In this work, we manually curated a reliable dataset of m 6 A sites and non-m 6 A sites and developed a new tool called RFAthM6A for predicting m 6 A sites in Arabidopsis thaliana . Briefly, RFAthM6A consists of four independent models named RFPSNSP, RFPSDSP, RFKSNPF and RFKNF and strict benchmarks show that the AUC values of the four models reached 0.894, 0.914, 0.920 and 0.926, respectively in a fivefold cross validation and the prediction performance of RFPSDSP, RFKSNPF and RFKNF exceeded that of three previously reported models (AthMethPre, M6ATH and RAM-NPPS). Linear combination of the prediction scores of RFPSDSP, RFKSNPF and RFKNF improved the prediction performance. We also extracted several predominant rules that underlie the m 6 A site identification from the trained models. Furthermore, the most important features of the predictors for the m 6 A site identification were also analyzed in depth. To facilitate use of our proposed models by interested researchers, all the source codes and datasets are publicly deposited at https://github.com/nongdaxiaofeng/RFAthM6A .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0167-4412 1573-5028 1573-5028
DOI:	10.1007/s11103-018-0698-9