RECOUNT: EXPECTATION MAXIMIZATION BASED ERROR CORRECTION TOOL FOR NEXT GENERATION SEQUENCING DATA
Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods dev...
Saved in:
Published in | Genome Informatics 2009 Vol. 23; no. 1; pp. 189 - 201 |
---|---|
Main Authors | , , , |
Format | Book Chapter Journal Article |
Language | English |
Published |
Japan
PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO
01.10.2009
|
Subjects | |
Online Access | Get full text |
ISBN | 9781848165625 1848165633 9781848165632 9781908978011 1848165625 1908978015 |
ISSN | 0919-9454 |
DOI | 10.1142/9781848165632_0018 |
Cover
Abstract | Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data.
In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount. |
---|---|
AbstractList | Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount. Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount. Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount. |
Author | WIJAYA, EDWARD FRITH, MARTIN C. HORTON, PAUL SUZUKI, YUTAKA |
Author_xml | – sequence: 1 givenname: EDWARD surname: WIJAYA fullname: WIJAYA, EDWARD email: e-wijaya@aist.go.jp organization: AIST, Computational Biology Research Center, 2-42 Aomi, Koutou-Ku, Tokyo 135-0064 – sequence: 2 givenname: MARTIN C. surname: FRITH fullname: FRITH, MARTIN C. email: m.frith@aist.go.jp organization: AIST, Computational Biology Research Center, 2-42 Aomi, Koutou-Ku, Tokyo 135-0064 – sequence: 3 givenname: YUTAKA surname: SUZUKI fullname: SUZUKI, YUTAKA email: ysuzuki@k.u-tokyo.ac.jp organization: Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 – sequence: 4 givenname: PAUL surname: HORTON fullname: HORTON, PAUL email: horton-p@aist.go.jp organization: AIST, Computational Biology Research Center, 2-42 Aomi, Koutou-Ku, Tokyo 135-0064 |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/20180274$$D View this record in MEDLINE/PubMed |
BookMark | eNqdkU1Pg0AQhtdY40ftH_BguHmq7rC7LJh4QNxWkgqKNGm8bFjYTdAWKrQx_nupVWPizblM5s3zzGHmCPWqutIInQA-B6D2hcddcKkLDnOILTEGdwcd_SRkFw1-ETbroUPsgTf0KKMHaNC2z7grijFh7j46sDsf25weoiwRQTyN0ktLzO5FkPppGEfWnT8L78Kn7XDtP4obSyRJnFhBnHTCZ5zG8cQadVkkZqk1FpFItvyjeJiKKAijsXXjp_4x2jPZvNWDr95H05FIg9vhJB6HgT8ZPtseMUPCFBjGTGYKg_M815rnwB3MSUY9h2IwRGPgUGBH2QaUUlkBNiUedlXOMpf00dl277KpX9e6XclF2eZ6Ps8qXa9byQkBYJxuyNMvcq0WupDLplxkzbv8vkoH8C3wVjfzos1LXa1KU-ZS1fVLKwHLzU_k35905tX_TKmaUhvyAeHyhf0 |
ContentType | Book Chapter Journal Article |
Copyright | Japanese Society for Bioinformatics |
Copyright_xml | – notice: Japanese Society for Bioinformatics |
DBID | CGR CUY CVF ECM EIF NPM 7X8 |
DOI | 10.1142/9781848165632_0018 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISBN | 1848165633 9781848165632 9781908978011 1908978015 |
Editor | Lee, Sang Yup Morishita, Shinichi Sakakibara, Yasubumi |
Editor_xml | – sequence: 1 givenname: Shinichi surname: Morishita fullname: Morishita, Shinichi organization: University of Tokyo – sequence: 2 givenname: Sang Yup surname: Lee fullname: Lee, Sang Yup organization: Korea Advanced Institute of Science & Technology – sequence: 3 givenname: Yasubumi surname: Sakakibara fullname: Sakakibara, Yasubumi organization: Keio University |
EndPage | 201 |
ExternalDocumentID | 20180274 10.1142/9781848165632_0018 |
Genre | Research Support, Non-U.S. Gov't Journal Article |
GroupedDBID | 089 20A 38. 92K 9WS AABBV AATMT ABCYV ACZWY ADCHV AIQUZ ALMA_UNASSIGNED_HOLDINGS ALUEM AZZ BBABE CZZ JJU MYL PE1 TM9 V1H WMAQA 53G ADBBV BAWUL CGR CUY CVF DIK ECM EIF FRP JSF JSH KQ8 NPM OK1 RJT RZJ W2D 7X8 |
ID | FETCH-LOGICAL-j293f-35b1f55fafdf0cccee7c176073a496401f3e0171d06b2f1bbbad1243908bc5a83 |
ISBN | 9781848165625 1848165633 9781848165632 9781908978011 1848165625 1908978015 |
ISSN | 0919-9454 |
IngestDate | Fri Jul 11 02:05:55 EDT 2025 Thu Apr 03 07:02:03 EDT 2025 Sat Mar 08 06:32:14 EST 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 1 |
Keywords | transcriptomics tag count correction sequence analysis next generation sequencing |
Language | English |
LinkModel | OpenURL |
MeetingName | Proceedings of the 20th International Conference |
MergedId | FETCHMERGED-LOGICAL-j293f-35b1f55fafdf0cccee7c176073a496401f3e0171d06b2f1bbbad1243908bc5a83 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
PMID | 20180274 |
PQID | 733115748 |
PQPubID | 23479 |
PageCount | 13 |
ParticipantIDs | worldscientific_books_10_1142_9781848165632_0018 proquest_miscellaneous_733115748 worldscientific_books_10_1142_9781848165632_0018_brief pubmed_primary_20180274 |
PublicationCentury | 2000 |
PublicationDate | 20091000 |
PublicationDateYYYYMMDD | 2009-10-01 |
PublicationDate_xml | – month: 10 year: 2009 text: 20091000 |
PublicationDecade | 2000 |
PublicationPlace | Japan |
PublicationPlace_xml | – name: Japan |
PublicationSubtitle | Genome Informatics Series Vol. 23 |
PublicationTitle | Genome Informatics 2009 |
PublicationTitleAlternate | Genome Inform |
PublicationYear | 2009 |
Publisher | PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO |
Publisher_xml | – name: PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO |
SSID | ssj0000400358 ssj0036957 |
Score | 1.9109057 |
Snippet | Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible... |
SourceID | proquest pubmed worldscientific |
SourceType | Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 189 |
SubjectTerms | Genome Models, Statistical Part A Full Papers Probability Sequence Analysis, DNA - methods |
Title | RECOUNT: EXPECTATION MAXIMIZATION BASED ERROR CORRECTION TOOL FOR NEXT GENERATION SEQUENCING DATA |
URI | https://www.worldscientific.com/doi/10.1142/9781848165632_0018 https://www.ncbi.nlm.nih.gov/pubmed/20180274 https://www.proquest.com/docview/733115748 |
Volume | 23 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9owFLag0qRdNO3WjV0qP-wN0ZF7srcUUpGWSxucQfsSxUksddJgauFh-4n7VTsnDkmg1ST2EmHj2Mbn4_j4-FwI-WxkXFgCk2Zwm3d0LrKOo2VKRyRCaEZsWZyj7_BobA5C_WxuzBuNPzWrpfWKHye_H_Qr-R-qQh3QFb1k96Bs2SlUwGegLzyBwvDcEX631aw3RRK45Q-0fCzEzuQOwwOW5_qZf-ZeuTmr68_coIq6GPhskKtBQZT1x-3ecalkCa_Dcz_nySFzz0vSDiYBm4ylxi4c1jEWeL1JOGaoVfDmF16PyRw-I3cOLPFaFk7cqddve0EwCVArBq_k1QyTfMAJtD325qwtDejyL6beZeiNe6hE67vM3dJKOKV9m8RRoXCD_k-u2v7oAg6l7hBGGQ49vCdDcOXhs_r-lAX-Schky9kkGPaleo35pwjDuuJu6-irYB4AE49vNfaryHRExU6uygnd3yR0VdqFbPrQVDTts6stcWMGsLNTlvaL_-qlSZqWjVkcvimDUtuHrFIzbBnDoJh3raBhuhEFr2AtkBiMIhRZreWzrXGK8qa9snEG09Uv9-fz0NHpCXmeB-SVTrdok1YTqtgL8hQdbSh6wMCvfkka2eIVeSSTo_56TeICXV9pDVu0ji2aY4vm2KIVtihiiwK2KGKLVtiiFbYoYusNCU891ht0ioQgne8glYqOZnBFGIaIRSq6SQLynZUolgm7VKw7pt5VhJZh_Ke0a3JVKBz4TAryqwYLxRMjtrVDcrBYLrJ3hDpprBm6lqSaaeqZo0DHluXEdqymKXecrEXoZtUiYLh4ixYvsuX6LsIkp4ph6XaLvJWrGf2UgWEiFaPhqZbeIt2d5Y2QTdxFMgqAGt2nUouY-74S8dubTLzff6wP5HH1l_1IDla36-wTiNYrfkSa55f2UY7dv5yAoa0 |
linkProvider | Open Access Publishing in European Networks |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Genome+Informatics+2009&rft.au=WIJAYA%2C+EDWARD&rft.au=FRITH%2C+MARTIN+C.&rft.au=SUZUKI%2C+YUTAKA&rft.au=HORTON%2C+PAUL&rft.atitle=RECOUNT%3A+EXPECTATION+MAXIMIZATION+BASED+ERROR+CORRECTION+TOOL+FOR+NEXT+GENERATION+SEQUENCING+DATA&rft.date=2009-10-01&rft.pub=PUBLISHED+BY+IMPERIAL+COLLEGE+PRESS+AND+DISTRIBUTED+BY+WORLD+SCIENTIFIC+PUBLISHING+CO&rft.isbn=9781848165625&rft.spage=189&rft.epage=201&rft_id=info:doi/10.1142%2F9781848165632_0018&rft.externalDBID=n%2Fa&rft.externalDocID=10.1142%2F9781848165632_0018 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.worldscientific.com%2Faction%2FshowCoverImage%3Fdoi%3D10.1142%2F9781848165632_0018 |