High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing

RNA Capture Long Seq (CLS) is a new method for transcript annotation that combines targeted RNA capture with long-read sequencing. CLS reannotates GENCODE lncRNAs and increases the number of validated splice junctions and transcript models for targeted loci. Accurate annotation of genes and their tr...

Full description

Saved in:
Bibliographic Details
Published inNature genetics Vol. 49; no. 12; pp. 1731 - 1740
Main Authors Lagarde, Julien, Uszczynska-Ratajczak, Barbara, Carbonell, Silvia, Pérez-Lluch, Sílvia, Abad, Amaya, Davis, Carrie, Gingeras, Thomas R, Frankish, Adam, Harrow, Jennifer, Guigo, Roderic, Johnson, Rory
Format Journal Article
LanguageEnglish
Published New York Nature Publishing Group US 01.12.2017
Nature Publishing Group
Nature Research
Subjects
Online AccessGet full text
ISSN1061-4036
1546-1718
1546-1718
DOI10.1038/ng.3988

Cover

Loading…
More Information
Summary:RNA Capture Long Seq (CLS) is a new method for transcript annotation that combines targeted RNA capture with long-read sequencing. CLS reannotates GENCODE lncRNAs and increases the number of validated splice junctions and transcript models for targeted loci. Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete—many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Present address: Department of Clinical Research, University of Bern, Murtenstrasse 35, 3010 Bern, Switzerland.
Present address: Centre of New Technologies, S. Banacha 2C, 02-097 Warsaw, Poland
Present address: Illumina, Cambridge, UK.
Equal contribution
ISSN:1061-4036
1546-1718
1546-1718
DOI:10.1038/ng.3988