iPReditor-CMG: Improving a predictive RNA editor for crop mitochondrial genomes using genomic sequence features and an optimal support vector machine

In crops, RNA editing is one of the most important post-transcriptional processes in which specific cytidines (C) in virtually all mitochondrial protein-coding genes are converted to uridines (U). Despite extensive recent research in RNA editing, exploring all of the C-to-U editing events efficientl...

Full description

Saved in:
Bibliographic Details
Published inPhytochemistry (Oxford) Vol. 200; p. 113222
Main Authors Qin, Sidong, Fan, Yanjun, Hu, Shengnan, Wang, Yongqiang, Wang, Ziqi, Cao, Yixiang, Liu, Qiyuan, Tan, Siqiao, Dai, Zhijun, Zhou, Wei
Format Journal Article
LanguageEnglish
Published England Elsevier Ltd 01.08.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In crops, RNA editing is one of the most important post-transcriptional processes in which specific cytidines (C) in virtually all mitochondrial protein-coding genes are converted to uridines (U). Despite extensive recent research in RNA editing, exploring all of the C-to-U editing events efficiently on the genomic scale remains challengeable. Developing accurate prediction methods for the detection of RNA editing sites would dramatically reduce experimental determination. Therefore, we propose a novel method, iPReditor-CMG (improved predictive RNA editor for crop mitochondrial genomes), to predict crop mitochondrial editing sites using genome sequence and an optimised support vector machine (SVM). We first selected three mitochondrial genomes with known RNA editing sites from Arabidopsis thaliana, Brassica napus and Oryza sativa, released by NCBI, as the training and test sets. The genes and their transcripts from self-sequenced tobacco mitochondrial ATPase were selected as the validation set. The iPReditor-CMG first coded the genome sequences as numerical vectors and then performed an efficient feature selection on the high-dimensional feature space, where the SVM was employed in feature selection and following modelling. The average independent prediction accuracy of intraspecific editing sites across three species was 0.85, and up to 0.91 in A. thaliana, which outperformed the reference models. For the interspecific independent prediction, the prediction accuracy between dicotyledons was 0.78 and the accuracy between dicotyledons and monocotyledons was 0.56, which implies that there might be similarity in the C-to-U editing mechanism in close relatives. Finally, the best model was identified with an independent test accuracy of 0.91 and an AUC of 0.88, which suggested that five unreported feature sequences, i.e. TGACA, ACAAC, GTAGA, CCGTT and TAACA, are closely associated with the editing phenomenon. Multiple tests supported that the iPReditor-CMG could be effectively applied to predict editing sites in crop mitochondria, which may further contribute to understanding the mechanisms of site editing and post-transcriptional events in crop mitochondria. The iPReditor-CMG performed excellently in the independent prediction of intraspecific editing sites, and its findings in interspecies supported that the RNA editing mechanisms might be species-specific. [Display omitted] •The intraspecific prediction accuracy for editing sites across three species was 0.85–0.91•The accuracy of mutual prediction between dicotyledons and monocotyledons was 0.56.•Similarity in the C-to-U editing mechanism in close relatives is suggested.•The best model was identified with an independent test accuracy of 0.91 and an AUC of 0.88•It is suggested that five unreported feature sequences are closely associated with the editing phenomenon.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0031-9422
1873-3700
DOI:10.1016/j.phytochem.2022.113222