RFAthM6A: a new tool for predicting m6A sites in Arabidopsis thaliana

Key message We curated a reliable dataset of m 6 A sites in Arabidopsis thaliana , built competitive models for predicting m 6 A sites, extracted predominant rules from the prediction models and analyzed the most important features. In biological RNA, approximately 150 chemical modifications have be...

Full description

Saved in:
Bibliographic Details
Published inPlant molecular biology Vol. 96; no. 3; pp. 327 - 337
Main Authors Wang, Xiaofeng, Yan, Renxiang
Format Journal Article
LanguageEnglish
Published Dordrecht Springer Netherlands 01.02.2018
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Key message We curated a reliable dataset of m 6 A sites in Arabidopsis thaliana , built competitive models for predicting m 6 A sites, extracted predominant rules from the prediction models and analyzed the most important features. In biological RNA, approximately 150 chemical modifications have been discovered, of which N 6 -methyladenine (m 6 A) is the most prevalent and abundant. This modification plays an essential role in a myriad of biological mechanisms and regulates RNA localization, nuclear export, translation, stability, alternative splicing, and other processes. However, m 6 A-seq and other wet-lab techniques do not easily facilitate accurate and complete determination of m 6 A sites across the transcriptome. Therefore, the use of computational methods to establish accurate models for predicting m 6 A sites is essential. In this work, we manually curated a reliable dataset of m 6 A sites and non-m 6 A sites and developed a new tool called RFAthM6A for predicting m 6 A sites in Arabidopsis thaliana . Briefly, RFAthM6A consists of four independent models named RFPSNSP, RFPSDSP, RFKSNPF and RFKNF and strict benchmarks show that the AUC values of the four models reached 0.894, 0.914, 0.920 and 0.926, respectively in a fivefold cross validation and the prediction performance of RFPSDSP, RFKSNPF and RFKNF exceeded that of three previously reported models (AthMethPre, M6ATH and RAM-NPPS). Linear combination of the prediction scores of RFPSDSP, RFKSNPF and RFKNF improved the prediction performance. We also extracted several predominant rules that underlie the m 6 A site identification from the trained models. Furthermore, the most important features of the predictors for the m 6 A site identification were also analyzed in depth. To facilitate use of our proposed models by interested researchers, all the source codes and datasets are publicly deposited at https://github.com/nongdaxiaofeng/RFAthM6A .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0167-4412
1573-5028
1573-5028
DOI:10.1007/s11103-018-0698-9