RECOUNT: EXPECTATION MAXIMIZATION BASED ERROR CORRECTION TOOL FOR NEXT GENERATION SEQUENCING DATA

Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods dev...

Full description

Saved in:
Bibliographic Details
Published inGenome Informatics 2009 Vol. 23; no. 1; pp. 189 - 201
Main Authors WIJAYA, EDWARD, FRITH, MARTIN C., SUZUKI, YUTAKA, HORTON, PAUL
Format Book Chapter Journal Article
LanguageEnglish
Published Japan PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO 01.10.2009
Subjects
Online AccessGet full text
ISBN9781848165625
1848165633
9781848165632
9781908978011
1848165625
1908978015
ISSN0919-9454
DOI10.1142/9781848165632_0018

Cover

Abstract Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.
AbstractList Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.
Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.
Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.
Author WIJAYA, EDWARD
FRITH, MARTIN C.
HORTON, PAUL
SUZUKI, YUTAKA
Author_xml – sequence: 1
  givenname: EDWARD
  surname: WIJAYA
  fullname: WIJAYA, EDWARD
  email: e-wijaya@aist.go.jp
  organization: AIST, Computational Biology Research Center, 2-42 Aomi, Koutou-Ku, Tokyo 135-0064
– sequence: 2
  givenname: MARTIN C.
  surname: FRITH
  fullname: FRITH, MARTIN C.
  email: m.frith@aist.go.jp
  organization: AIST, Computational Biology Research Center, 2-42 Aomi, Koutou-Ku, Tokyo 135-0064
– sequence: 3
  givenname: YUTAKA
  surname: SUZUKI
  fullname: SUZUKI, YUTAKA
  email: ysuzuki@k.u-tokyo.ac.jp
  organization: Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562
– sequence: 4
  givenname: PAUL
  surname: HORTON
  fullname: HORTON, PAUL
  email: horton-p@aist.go.jp
  organization: AIST, Computational Biology Research Center, 2-42 Aomi, Koutou-Ku, Tokyo 135-0064
BackLink https://www.ncbi.nlm.nih.gov/pubmed/20180274$$D View this record in MEDLINE/PubMed
BookMark eNqdkU1Pg0AQhtdY40ftH_BguHmq7rC7LJh4QNxWkgqKNGm8bFjYTdAWKrQx_nupVWPizblM5s3zzGHmCPWqutIInQA-B6D2hcddcKkLDnOILTEGdwcd_SRkFw1-ETbroUPsgTf0KKMHaNC2z7grijFh7j46sDsf25weoiwRQTyN0ktLzO5FkPppGEfWnT8L78Kn7XDtP4obSyRJnFhBnHTCZ5zG8cQadVkkZqk1FpFItvyjeJiKKAijsXXjp_4x2jPZvNWDr95H05FIg9vhJB6HgT8ZPtseMUPCFBjGTGYKg_M815rnwB3MSUY9h2IwRGPgUGBH2QaUUlkBNiUedlXOMpf00dl277KpX9e6XclF2eZ6Ps8qXa9byQkBYJxuyNMvcq0WupDLplxkzbv8vkoH8C3wVjfzos1LXa1KU-ZS1fVLKwHLzU_k35905tX_TKmaUhvyAeHyhf0
ContentType Book Chapter
Journal Article
Copyright Japanese Society for Bioinformatics
Copyright_xml – notice: Japanese Society for Bioinformatics
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1142/9781848165632_0018
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISBN 1848165633
9781848165632
9781908978011
1908978015
Editor Lee, Sang Yup
Morishita, Shinichi
Sakakibara, Yasubumi
Editor_xml – sequence: 1
  givenname: Shinichi
  surname: Morishita
  fullname: Morishita, Shinichi
  organization: University of Tokyo
– sequence: 2
  givenname: Sang Yup
  surname: Lee
  fullname: Lee, Sang Yup
  organization: Korea Advanced Institute of Science & Technology
– sequence: 3
  givenname: Yasubumi
  surname: Sakakibara
  fullname: Sakakibara, Yasubumi
  organization: Keio University
EndPage 201
ExternalDocumentID 20180274
10.1142/9781848165632_0018
Genre Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID 089
20A
38.
92K
9WS
AABBV
AATMT
ABCYV
ACZWY
ADCHV
AIQUZ
ALMA_UNASSIGNED_HOLDINGS
ALUEM
AZZ
BBABE
CZZ
JJU
MYL
PE1
TM9
V1H
WMAQA
53G
ADBBV
BAWUL
CGR
CUY
CVF
DIK
ECM
EIF
FRP
JSF
JSH
KQ8
NPM
OK1
RJT
RZJ
W2D
7X8
ID FETCH-LOGICAL-j293f-35b1f55fafdf0cccee7c176073a496401f3e0171d06b2f1bbbad1243908bc5a83
ISBN 9781848165625
1848165633
9781848165632
9781908978011
1848165625
1908978015
ISSN 0919-9454
IngestDate Fri Jul 11 02:05:55 EDT 2025
Thu Apr 03 07:02:03 EDT 2025
Sat Mar 08 06:32:14 EST 2025
IsPeerReviewed false
IsScholarly true
Issue 1
Keywords transcriptomics
tag count correction
sequence analysis
next generation sequencing
Language English
LinkModel OpenURL
MeetingName Proceedings of the 20th International Conference
MergedId FETCHMERGED-LOGICAL-j293f-35b1f55fafdf0cccee7c176073a496401f3e0171d06b2f1bbbad1243908bc5a83
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 20180274
PQID 733115748
PQPubID 23479
PageCount 13
ParticipantIDs worldscientific_books_10_1142_9781848165632_0018
proquest_miscellaneous_733115748
worldscientific_books_10_1142_9781848165632_0018_brief
pubmed_primary_20180274
PublicationCentury 2000
PublicationDate 20091000
PublicationDateYYYYMMDD 2009-10-01
PublicationDate_xml – month: 10
  year: 2009
  text: 20091000
PublicationDecade 2000
PublicationPlace Japan
PublicationPlace_xml – name: Japan
PublicationSubtitle Genome Informatics Series Vol. 23
PublicationTitle Genome Informatics 2009
PublicationTitleAlternate Genome Inform
PublicationYear 2009
Publisher PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO
Publisher_xml – name: PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO
SSID ssj0000400358
ssj0036957
Score 1.9109057
Snippet Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible...
SourceID proquest
pubmed
worldscientific
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 189
SubjectTerms Genome
Models, Statistical
Part A Full Papers
Probability
Sequence Analysis, DNA - methods
Title RECOUNT: EXPECTATION MAXIMIZATION BASED ERROR CORRECTION TOOL FOR NEXT GENERATION SEQUENCING DATA
URI https://www.worldscientific.com/doi/10.1142/9781848165632_0018
https://www.ncbi.nlm.nih.gov/pubmed/20180274
https://www.proquest.com/docview/733115748
Volume 23
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9owFLag0qRdNO3WjV0qP-wN0ZF7srcUUpGWSxucQfsSxUksddJgauFh-4n7VTsnDkmg1ST2EmHj2Mbn4_j4-FwI-WxkXFgCk2Zwm3d0LrKOo2VKRyRCaEZsWZyj7_BobA5C_WxuzBuNPzWrpfWKHye_H_Qr-R-qQh3QFb1k96Bs2SlUwGegLzyBwvDcEX631aw3RRK45Q-0fCzEzuQOwwOW5_qZf-ZeuTmr68_coIq6GPhskKtBQZT1x-3ecalkCa_Dcz_nySFzz0vSDiYBm4ylxi4c1jEWeL1JOGaoVfDmF16PyRw-I3cOLPFaFk7cqddve0EwCVArBq_k1QyTfMAJtD325qwtDejyL6beZeiNe6hE67vM3dJKOKV9m8RRoXCD_k-u2v7oAg6l7hBGGQ49vCdDcOXhs_r-lAX-Schky9kkGPaleo35pwjDuuJu6-irYB4AE49vNfaryHRExU6uygnd3yR0VdqFbPrQVDTts6stcWMGsLNTlvaL_-qlSZqWjVkcvimDUtuHrFIzbBnDoJh3raBhuhEFr2AtkBiMIhRZreWzrXGK8qa9snEG09Uv9-fz0NHpCXmeB-SVTrdok1YTqtgL8hQdbSh6wMCvfkka2eIVeSSTo_56TeICXV9pDVu0ji2aY4vm2KIVtihiiwK2KGKLVtiiFbYoYusNCU891ht0ioQgne8glYqOZnBFGIaIRSq6SQLynZUolgm7VKw7pt5VhJZh_Ke0a3JVKBz4TAryqwYLxRMjtrVDcrBYLrJ3hDpprBm6lqSaaeqZo0DHluXEdqymKXecrEXoZtUiYLh4ixYvsuX6LsIkp4ph6XaLvJWrGf2UgWEiFaPhqZbeIt2d5Y2QTdxFMgqAGt2nUouY-74S8dubTLzff6wP5HH1l_1IDla36-wTiNYrfkSa55f2UY7dv5yAoa0
linkProvider Open Access Publishing in European Networks
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Genome+Informatics+2009&rft.au=WIJAYA%2C+EDWARD&rft.au=FRITH%2C+MARTIN+C.&rft.au=SUZUKI%2C+YUTAKA&rft.au=HORTON%2C+PAUL&rft.atitle=RECOUNT%3A+EXPECTATION+MAXIMIZATION+BASED+ERROR+CORRECTION+TOOL+FOR+NEXT+GENERATION+SEQUENCING+DATA&rft.date=2009-10-01&rft.pub=PUBLISHED+BY+IMPERIAL+COLLEGE+PRESS+AND+DISTRIBUTED+BY+WORLD+SCIENTIFIC+PUBLISHING+CO&rft.isbn=9781848165625&rft.spage=189&rft.epage=201&rft_id=info:doi/10.1142%2F9781848165632_0018&rft.externalDBID=n%2Fa&rft.externalDocID=10.1142%2F9781848165632_0018
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.worldscientific.com%2Faction%2FshowCoverImage%3Fdoi%3D10.1142%2F9781848165632_0018