RNA-Seq gene expression estimation with read mapping uncertainty
Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which the...
Saved in:
Published in | Bioinformatics Vol. 26; no. 4; pp. 493 - 500 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Oxford
Oxford University Press
15.02.2010
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically. Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed. Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics on |
---|---|
Bibliography: | To whom correspondence should be addressed. istex:33C262CA405AB4AD6C01E52F21BBA3B75FCC38A9 ArticleID:btp692 ark:/67375/HXZ-5PZQD461-J Associate Editor: Joaquin Dopazo ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1367-4803 1460-2059 1367-4811 |
DOI: | 10.1093/bioinformatics/btp692 |