PacBio full‐length cDNA sequencing integrated with RNA‐seq reads drastically improves the discovery of splicing transcripts in rice

SUMMARY In eukaryotes, alternative splicing (AS) greatly expands the diversity of transcripts. However, it is challenging to accurately determine full‐length splicing isoforms. Recently, more studies have taken advantage of Pacific Bioscience (PacBio) long‐read sequencing to identify full‐length tra...

Full description

Saved in:
Bibliographic Details
Published inThe Plant journal : for cell and molecular biology Vol. 97; no. 2; pp. 296 - 305
Main Authors Zhang, Guoqiang, Sun, Min, Wang, Jianfeng, Lei, Meng, Li, Chenji, Zhao, Duojun, Huang, Jun, Li, Wenjie, Li, Shuangli, Li, Jing, Yang, Jin, Luo, Yingfeng, Hu, Songnian, Zhang, Bing
Format Journal Article
LanguageEnglish
Published England Blackwell Publishing Ltd 01.01.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:SUMMARY In eukaryotes, alternative splicing (AS) greatly expands the diversity of transcripts. However, it is challenging to accurately determine full‐length splicing isoforms. Recently, more studies have taken advantage of Pacific Bioscience (PacBio) long‐read sequencing to identify full‐length transcripts. Nevertheless, the high error rate of PacBio reads seriously offsets the advantages of long reads, especially for accurately identifying splicing junctions. To best capitalize on the features of long reads, we used Illumina RNA‐seq reads to improve PacBio circular consensus sequence (CCS) quality and to validate splicing patterns in the rice transcriptome. We evaluated the impact of CCS accuracy on the number and the validation rate of splicing isoforms, and integrated a comprehensive pipeline of splicing transcripts analysis by Iso‐Seq and RNA‐seq (STAIR) to identify the full‐length multi‐exon isoforms in rice seedling transcriptome (Oryza sativa L. ssp. japonica). STAIR discovered 11 733 full‐length multi‐exon isoforms, 6599 more than the SMRT Portal RS_IsoSeq pipeline did. Of these splicing isoforms identified, 4453 (37.9%) were missed in assembled transcripts from RNA‐seq reads, and 5204 (44.4%), including 268 multi‐exon long non‐coding RNAs (lncRNAs), were not reported in the MSU_osa1r7 annotation. Some randomly selected unreported splicing junctions were verified by polymerase chain reaction (PCR) amplification. In addition, we investigated alternative polyadenylation (APA) events in transcripts and identified 829 major polyadenylation [poly(A)] site clusters (PACs). The analysis of splicing isoforms and APA events will facilitate the annotation of the rice genome and studies on the expression and polyadenylation of AS genes in different developmental stages or growth conditions of rice. Significance statement In this study, we integrated PacBio full‐length cDNA sequencing and RNA‐seq into a STAIR pipeline to improve the discovery of splicing isoforms in rice, and further investigated alternative polyadenylation events in transcripts. The analysis of splicing isoforms and alternative polyadenylation events will facilitate the annotation of the rice genome and the understanding of the expressions of alternative splicing genes in rice.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0960-7412
1365-313X
DOI:10.1111/tpj.14120