RECOUNT: EXPECTATION MAXIMIZATION BASED ERROR CORRECTION TOOL FOR NEXT GENERATION SEQUENCING DATA
Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods dev...
Saved in:
Published in | Genome Informatics 2009 Vol. 23; no. 1; pp. 189 - 201 |
---|---|
Main Authors | , , , |
Format | Book Chapter Journal Article |
Language | English |
Published |
Japan
PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO
01.10.2009
|
Subjects | |
Online Access | Get full text |
ISBN | 9781848165625 1848165633 9781848165632 9781908978011 1848165625 1908978015 |
ISSN | 0919-9454 |
DOI | 10.1142/9781848165632_0018 |
Cover
Loading…
Summary: | Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data.
In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISBN: | 9781848165625 1848165633 9781848165632 9781908978011 1848165625 1908978015 |
ISSN: | 0919-9454 |
DOI: | 10.1142/9781848165632_0018 |