Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification

Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves...

Full description

Saved in:
Bibliographic Details
Published inGenome Biology Vol. 25; no. 1; p. 106
Main Authors Song, Li, Langmead, Ben
Format Journal Article
LanguageEnglish
Published England BioMed Central 25.04.2024
BMC
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.
AbstractList Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.
Abstract Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.
Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.
ArticleNumber 106
Author Song, Li
Langmead, Ben
Author_xml – sequence: 1
  givenname: Li
  orcidid: 0000-0002-0180-7426
  surname: Song
  fullname: Song, Li
– sequence: 2
  givenname: Ben
  surname: Langmead
  fullname: Langmead, Ben
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38664753$$D View this record in MEDLINE/PubMed
BookMark eNqFkkuLFDEUhYOMOA_9Ay4k4MZNaW6eVe6k8TEw4EbBXUhSN02aqsqYVC3892a6Z0RmoatcwncO93EuydmSFyTkJbC3AL1-V0EwNXSMy44JLmUnn5ALkEZ2RrMfZ3_V5-Sy1gNjMEiun5Fz0WstjRIXZNvhspYUtz2W93TKtU5YKw15vi2tSHmhOdI5hZJ9chPd45JnrDTmQjHGFFLTU7eM1IWwFbcinXF1RywFWvHnhktAGibX3Brv1ub5nDyNbqr44v69It8_ffy2-9LdfP18vftw0wXNhrUzxuvQAwTtRzY6xxgbRRgiBCeE10pJoTzveW_iMPQQndJo4shRSA2gjbgi1yffMbuDvS1pduWXzS7Z40cue-vKmsKEVnkUogchBsdkdN5DAIkQPIPYS8Tm9ebkdVtyG6qudk414DS5BfNWrQAlWkuGi_-jTJpBSgF3Hb5-hB7yVpa2lEYpyYGzQTXq1T21-RnHP5M83LEB_Qlod6q1YLQhrcdVr8WlyQKzd5Gxp8jYFhl7jIyVTcofSR_c_yH6Dbc_w9M
CitedBy_id crossref_primary_10_1038_s41467_025_57088_y
Cites_doi 10.1038/s41591-019-0405-7
10.1038/s41587-023-01688-w
10.1007/978-3-642-28332-1_21
10.1101/2022.05.19.492613
10.5281/zenodo.10938378
10.1186/s13059-019-1891-0
10.1038/s41592-022-01431-4
10.1007/978-3-031-20643-6_14
10.1016/j.tcs.2013.10.019
10.1093/bioinformatics/btx106
10.1093/bioinformatics/bth408
10.1038/s41579-018-0029-9
10.1038/nmeth.2066
10.7717/peerj-cs.104
10.1093/bioinformatics/btx067
10.1089/cmb.2006.13.1028
10.1111/j.2517-6161.1977.tb01600.x
10.1093/bioinformatics/btu541
10.1186/s13059-018-1554-6
10.1093/bioinformatics/bts280
10.1093/bioinformatics/btad233
10.1101/2023.02.27.530134
10.1093/nar/gkl842
10.3389/fmicb.2021.766364
10.1186/gb-2014-15-3-r46
10.1186/s12864-015-1419-2
10.1101/gr.210641.116
10.1093/nar/gks1195
10.1093/bioinformatics/btac845
10.1101/2023.11.20.567879
10.1038/s41467-021-26266-z
10.1093/bioinformatics/btaa458
10.1016/j.tcs.2007.07.018
10.1038/nrg1709
10.1016/j.tcs.2012.02.006
10.1038/s41467-019-10934-2
10.1101/gr.277642.123
10.1093/nar/gkab776
10.1109/SFCS.2000.892127
10.1038/s41576-019-0113-7
10.1007/s00453-018-0475-9
10.1093/bioinformatics/btr708
10.1089/cmb.2009.0169
10.1101/2023.07.20.549822
10.1186/s13059-022-02610-4
10.1186/gb-2009-10-3-r25
10.1101/2023.12.07.570547
10.1038/ncomms11257
ContentType Journal Article
Copyright 2024. The Author(s).
2024. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2024. The Author(s).
– notice: 2024. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7X7
7XB
88E
8FE
8FH
8FI
8FJ
8FK
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
COVID
DWQXO
FYUFA
GHDGH
GNUQQ
HCIFZ
K9.
LK8
M0S
M1P
M7P
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
7X8
7S9
L.6
DOA
DOI 10.1186/s13059-024-03244-4
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
ProQuest Central (Corporate)
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Database
ProQuest Central
Natural Science Collection
ProQuest One Community College
Coronavirus Research Database
ProQuest Central Korea
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Biological Sciences
ProQuest Health & Medical Collection
Medical Database
Biological Science Database
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
MEDLINE - Academic
AGRICOLA
AGRICOLA - Academic
DOAJ (Directory of Open Access Journals)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Health & Medical Research Collection
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
Natural Science Collection
ProQuest Central Korea
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Coronavirus Research Database
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
Biological Science Database
ProQuest SciTech Collection
ProQuest Hospital Collection (Alumni)
ProQuest Health & Medical Complete
ProQuest Medical Library
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
AGRICOLA
AGRICOLA - Academic
DatabaseTitleList AGRICOLA

Publicly Available Content Database
CrossRef
MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 4
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1474-760X
EndPage 106
ExternalDocumentID oai_doaj_org_article_5be3381339a04fabb1c14e1cb01f84ee
38664753
10_1186_s13059_024_03244_4
Genre Research Support, U.S. Gov't, Non-P.H.S
Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NHGRI NIH HHS
  grantid: R01HG011392
– fundername: NIGMS NIH HHS
  grantid: P20GM130454
– fundername: NIGMS NIH HHS
  grantid: R35GM139602
– fundername: NIGMS NIH HHS
  grantid: R35 GM139602
– fundername: NIGMS NIH HHS
  grantid: P20 GM130454
– fundername: NHGRI NIH HHS
  grantid: R01 HG011392
– fundername: NIGMS NIH HHS
  grantid: 3P20GM130454-05WS
GroupedDBID ---
0R~
29H
4.4
53G
5GY
5VS
7X7
88E
8FE
8FH
8FI
8FJ
AAFWJ
AAHBH
AAJSJ
AASML
AAYXX
ABUWG
ACGFO
ACGFS
ACJQM
ACPRK
ADBBV
ADUKV
AEGXH
AFKRA
AFPKN
AHBYD
AIAGR
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMTXH
AOIAM
AOIJS
BAPOH
BAWUL
BBNVY
BCNDV
BENPR
BFQNJ
BHPHI
BMC
BPHCQ
BVXVI
C6C
CCPQU
CITATION
EBD
EBLON
EBS
EMOBN
FYUFA
GROUPED_DOAJ
GX1
HCIFZ
HMCUK
IAO
IGS
IHR
ISR
ITC
KPI
LK8
M1P
M7P
PHGZM
PHGZT
PIMPY
PQQKQ
PROAC
PSQYO
ROL
RPM
RSV
SJN
SOJ
SV3
UKHRP
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7XB
8FK
AZQEC
COVID
DWQXO
GNUQQ
K9.
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQUKI
PRINS
7X8
7S9
L.6
PUEGO
ID FETCH-LOGICAL-c609t-77b6c811c6bd0daa000d3c9f1ca33b655435b28287f9981fa56e7fd2e34611673
IEDL.DBID DOA
ISSN 1474-760X
1474-7596
IngestDate Wed Aug 27 01:31:31 EDT 2025
Thu Jul 10 22:57:59 EDT 2025
Thu Jul 10 22:41:59 EDT 2025
Fri Jul 25 11:56:55 EDT 2025
Thu Apr 03 07:03:42 EDT 2025
Tue Jul 01 03:11:12 EDT 2025
Thu Apr 24 22:51:24 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords FM-index
r-index
Compact data structure
Metagenomic
Language English
License 2024. The Author(s).
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c609t-77b6c811c6bd0daa000d3c9f1ca33b655435b28287f9981fa56e7fd2e34611673
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-0180-7426
OpenAccessLink https://doaj.org/article/5be3381339a04fabb1c14e1cb01f84ee
PMID 38664753
PQID 3054212095
PQPubID 2040232
PageCount 1
ParticipantIDs doaj_primary_oai_doaj_org_article_5be3381339a04fabb1c14e1cb01f84ee
proquest_miscellaneous_3153655723
proquest_miscellaneous_3047944317
proquest_journals_3054212095
pubmed_primary_38664753
crossref_citationtrail_10_1186_s13059_024_03244_4
crossref_primary_10_1186_s13059_024_03244_4
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-04-25
PublicationDateYYYYMMDD 2024-04-25
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-04-25
  day: 25
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: London
PublicationTitle Genome Biology
PublicationTitleAlternate Genome Biol
PublicationYear 2024
Publisher BioMed Central
BMC
Publisher_xml – name: BioMed Central
– name: BMC
References H Li (3244_CR42) 2012; 28
3244_CR57
3244_CR56
F Meyer (3244_CR35) 2022; 19
cr-split#-3244_CR19.2
cr-split#-3244_CR19.1
JN Alanko (3244_CR33) 2023; 39
3244_CR55
3244_CR2
AP Dempster (3244_CR46) 1977; 39
3244_CR50
S Kreft (3244_CR22) 2013; 483
3244_CR48
D Kim (3244_CR18) 2016; 26
AM Thomas (3244_CR5) 2019; 25
cr-split#-3244_CR20.2
cr-split#-3244_CR20.1
P Menzel (3244_CR40) 2016; 7
F De Filippis (3244_CR4) 2021; 12
W Huang (3244_CR31) 2012; 28
AT Dilthey (3244_CR36) 2019; 10
3244_CR44
SG Tringe (3244_CR1) 2005; 6
S Gog (3244_CR54) 2019; 81
R Knight (3244_CR6) 2018; 16
3244_CR37
VC Piro (3244_CR16) 2020; 36
3244_CR34
M Roberts (3244_CR11) 2004; 20
J Kärkkäinen (3244_CR52) 2007; 387
W Shen (3244_CR17) 2023; 39
KD Pruitt (3244_CR7) 2007; 35
J Lu (3244_CR45) 2017; 3
3244_CR30
R Ounit (3244_CR15) 2015; 16
V Mäkinen (3244_CR24) 2010; 17
A Morgulis (3244_CR49) 2006; 13
3244_CR29
3244_CR28
O Ahmed (3244_CR38) 2023; 33
G Skoufos (3244_CR47) 2022; 23
3244_CR27
DE Wood (3244_CR10) 2019; 20
3244_CR26
H Li (3244_CR41) 2014; 30
3244_CR25
3244_CR23
T Gagie (3244_CR39) 2022
CY Chiu (3244_CR3) 2019; 20
DH Parks (3244_CR9) 2022; 50
J Barbay (3244_CR51) 2013; 513
DA Benson (3244_CR8) 2013; 41
N Segata (3244_CR14) 2012; 9
A Blanco-Míguez (3244_CR13) 2023; 41
DJ Nasko (3244_CR21) 2018; 19
MD Muggli (3244_CR32) 2017; 33
DE Wood (3244_CR12) 2014; 15
L Schaeffer (3244_CR43) 2017; 33
B Langmead (3244_CR53) 2009; 10
38014029 - bioRxiv. 2023 Nov 17:2023.11.15.567129. doi: 10.1101/2023.11.15.567129
References_xml – volume: 25
  start-page: 667
  year: 2019
  ident: 3244_CR5
  publication-title: Nat Med
  doi: 10.1038/s41591-019-0405-7
– ident: 3244_CR55
– volume: 41
  start-page: 1633
  year: 2023
  ident: 3244_CR13
  publication-title: Nat Biotechnol.
  doi: 10.1038/s41587-023-01688-w
– ident: 3244_CR23
  doi: 10.1007/978-3-642-28332-1_21
– ident: 3244_CR34
  doi: 10.1101/2022.05.19.492613
– ident: 3244_CR56
  doi: 10.5281/zenodo.10938378
– volume: 20
  start-page: 257
  year: 2019
  ident: 3244_CR10
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1891-0
– volume: 19
  start-page: 429
  year: 2022
  ident: 3244_CR35
  publication-title: Nat Methods
  doi: 10.1038/s41592-022-01431-4
– start-page: 191
  volume-title: String Processing and Information Retrieval
  year: 2022
  ident: 3244_CR39
  doi: 10.1007/978-3-031-20643-6_14
– ident: #cr-split#-3244_CR19.2
– volume: 513
  start-page: 109
  year: 2013
  ident: 3244_CR51
  publication-title: Theoret Comput Sci
  doi: 10.1016/j.tcs.2013.10.019
– volume: 33
  start-page: 2082
  year: 2017
  ident: 3244_CR43
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx106
– volume: 20
  start-page: 3363
  year: 2004
  ident: 3244_CR11
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bth408
– volume: 16
  start-page: 410
  year: 2018
  ident: 3244_CR6
  publication-title: Nat Rev Microbiol
  doi: 10.1038/s41579-018-0029-9
– ident: 3244_CR27
– volume: 9
  start-page: 811
  year: 2012
  ident: 3244_CR14
  publication-title: Nat Methods
  doi: 10.1038/nmeth.2066
– volume: 3
  year: 2017
  ident: 3244_CR45
  publication-title: PeerJ Comput Sci
  doi: 10.7717/peerj-cs.104
– volume: 33
  start-page: 3181
  year: 2017
  ident: 3244_CR32
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx067
– volume: 13
  start-page: 1028
  year: 2006
  ident: 3244_CR49
  publication-title: J Comput Biol
  doi: 10.1089/cmb.2006.13.1028
– volume: 39
  start-page: 1
  year: 1977
  ident: 3244_CR46
  publication-title: J Roy Stat Soc: Ser B (Methodol)
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– volume: 30
  start-page: 3274
  year: 2014
  ident: 3244_CR41
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu541
– volume: 19
  start-page: 165
  year: 2018
  ident: 3244_CR21
  publication-title: Genome Biol
  doi: 10.1186/s13059-018-1554-6
– volume: 28
  start-page: 1838
  year: 2012
  ident: 3244_CR42
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts280
– volume: 39
  start-page: i260
  year: 2023
  ident: 3244_CR33
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btad233
– ident: 3244_CR48
  doi: 10.1101/2023.02.27.530134
– volume: 35
  start-page: D61
  year: 2007
  ident: 3244_CR7
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkl842
– ident: 3244_CR2
  doi: 10.3389/fmicb.2021.766364
– volume: 15
  start-page: R46
  year: 2014
  ident: 3244_CR12
  publication-title: Genome Biol
  doi: 10.1186/gb-2014-15-3-r46
– volume: 16
  start-page: 236
  year: 2015
  ident: 3244_CR15
  publication-title: BMC Genomics
  doi: 10.1186/s12864-015-1419-2
– ident: #cr-split#-3244_CR19.1
– volume: 26
  start-page: 1721
  year: 2016
  ident: 3244_CR18
  publication-title: Genome Res
  doi: 10.1101/gr.210641.116
– volume: 41
  start-page: D36
  year: 2013
  ident: 3244_CR8
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gks1195
– volume: 39
  start-page: btac845
  year: 2023
  ident: 3244_CR17
  publication-title: Bioinformatics.
  doi: 10.1093/bioinformatics/btac845
– ident: 3244_CR28
– ident: 3244_CR30
– ident: 3244_CR44
  doi: 10.1101/2023.11.20.567879
– volume: 12
  start-page: 5958
  year: 2021
  ident: 3244_CR4
  publication-title: Nat Commun
  doi: 10.1038/s41467-021-26266-z
– volume: 36
  start-page: i12
  year: 2020
  ident: 3244_CR16
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btaa458
– ident: 3244_CR57
– ident: #cr-split#-3244_CR20.2
– volume: 387
  start-page: 249
  year: 2007
  ident: 3244_CR52
  publication-title: Theoret Comput Sci
  doi: 10.1016/j.tcs.2007.07.018
– volume: 6
  start-page: 805
  year: 2005
  ident: 3244_CR1
  publication-title: Nat Rev Genet
  doi: 10.1038/nrg1709
– volume: 483
  start-page: 115
  year: 2013
  ident: 3244_CR22
  publication-title: Theoret Comput Sci
  doi: 10.1016/j.tcs.2012.02.006
– volume: 10
  start-page: 3066
  year: 2019
  ident: 3244_CR36
  publication-title: Nat Commun
  doi: 10.1038/s41467-019-10934-2
– volume: 33
  start-page: 1069
  issue: 7
  year: 2023
  ident: 3244_CR38
  publication-title: Genome Res
  doi: 10.1101/gr.277642.123
– ident: 3244_CR25
– volume: 50
  start-page: D785
  year: 2022
  ident: 3244_CR9
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkab776
– ident: #cr-split#-3244_CR20.1
  doi: 10.1109/SFCS.2000.892127
– ident: 3244_CR29
– volume: 20
  start-page: 341
  year: 2019
  ident: 3244_CR3
  publication-title: Nat Rev Genet
  doi: 10.1038/s41576-019-0113-7
– volume: 81
  start-page: 1370
  year: 2019
  ident: 3244_CR54
  publication-title: Algorithmica
  doi: 10.1007/s00453-018-0475-9
– volume: 28
  start-page: 593
  year: 2012
  ident: 3244_CR31
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr708
– volume: 17
  start-page: 281
  year: 2010
  ident: 3244_CR24
  publication-title: J Comput Biol
  doi: 10.1089/cmb.2009.0169
– ident: 3244_CR37
  doi: 10.1101/2023.07.20.549822
– volume: 23
  start-page: 39
  year: 2022
  ident: 3244_CR47
  publication-title: Genome Biol
  doi: 10.1186/s13059-022-02610-4
– volume: 10
  start-page: R25
  year: 2009
  ident: 3244_CR53
  publication-title: Genome Biol
  doi: 10.1186/gb-2009-10-3-r25
– ident: 3244_CR50
  doi: 10.1101/2023.12.07.570547
– ident: 3244_CR26
– volume: 7
  start-page: 11257
  year: 2016
  ident: 3244_CR40
  publication-title: Nat Commun
  doi: 10.1038/ncomms11257
– reference: 38014029 - bioRxiv. 2023 Nov 17:2023.11.15.567129. doi: 10.1101/2023.11.15.567129
SSID ssj0019426
ssj0017866
Score 2.4703217
Snippet Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the...
Abstract Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the...
SourceID doaj
proquest
pubmed
crossref
SourceType Open Website
Aggregation Database
Index Database
Enrichment Source
StartPage 106
SubjectTerms Chlamydia
Classification
Compact data structure
Compression
Data Compression - methods
FM-index
genome
Genome, Bacterial
Genome, Microbial
Genomes
memory
Metagenomic
Metagenomics
Metagenomics - methods
r-index
Sequence Analysis, DNA - methods
Software
Taxonomy
SummonAdditionalLinks – databaseName: Health & Medical Collection
  dbid: 7X7
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELagCIkL4k1oQUbihqzGsePEvSBAVBUSnKi0N8vPaqXdpN3dHPj3zDgPxIE9JhlHUWY88409_oaQD040qUxCMSuVYhDxAnM6OFbFyoqkUyk8HnD-8VNdXcvvq3o1Lbjtp7LK2SdmRx16j2vk52CXuHkJiODT7R3DrlG4uzq10LhPHiB1GVp1s1oSLt60iFWmCy2r8agRFiDWWs0naFp1vgdHXmsG4YqVADAkk_9EqUzm_38EmiPR5RPyeIKQ9POo86fkXuyekYdjU8nfz8mQF2zXabiJuwu6gddvwJtRrB0fa1472ie6XWcGJngPsrRu454CeqUxE0rAeGq7QK33AzJJ0G082Cy29nSuvaYecTcWGmXdviDXl99-fb1iU3MF5lWpD4CqnfIt5165UAZrwTUG4XXi3grhFKAMUTvMx5oEGRlPtlaxSaGKQircuxEvyUnXd_E1odLzoIJWMDZJHoQNOmHObUPjPQDCgvD5Zxo_MY9jA4yNyRlIq8yoAAMKMFkBRhbk4zLmduTdOCr9BXW0SCJndr7R727MNAVN7SLk45CTa1vKZJ3jnsvIvSt5amWMBTmbNWymibw3f82uIO-XxzAFcV_FdrEfUAZp-hGJHZGByAI_talEQV6N1rN8rQAzlZA2vjn-AafkUZVNVLKqPiMnh90Q3wIiOrh32ez_AJs6BvA
  priority: 102
  providerName: ProQuest
Title Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification
URI https://www.ncbi.nlm.nih.gov/pubmed/38664753
https://www.proquest.com/docview/3054212095
https://www.proquest.com/docview/3047944317
https://www.proquest.com/docview/3153655723
https://doaj.org/article/5be3381339a04fabb1c14e1cb01f84ee
Volume 25
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwELagCIkLKs-GlpWRuCGrcew4MbcualUhUaGKSisulp9opd0s6m4O_HtmnGRVDpQLlxySseN4xp5v4vFnQt470aQyCcWsVIqBxwvM6eBYFSsrkk6l8LjB-cuVuryRnxf14s5RX5gTNtADDx13WrsIURREUtqWMlnnuOcycu9KnloZI86-4POmYGpcP9DgeKYtMq063cJMXWsG_oiVgCAkk3-4oczW_3eImV3NxSF5OmJEeja07Rl5ELvn5PFwauSvF6TPf2SXqf8Rbz_SFVS_gumKYnL4kNTa0U2i62WmWIJ6kIZ1HbcU4CmNmTECylPbBWq975Eqgq7jzmaxpadTcjX1CKwxkygr7yW5uTj_9umSjacnMK9KvQPY7JRvOffKhTJYC3NfEF4n7q0QTgGMELXDgKtJEHLxZGsVmxSqKKTCxRnxihx0my4eESo9DypoBWWT5EHYoBMG1TY03gPiKwifOtP4kVocT7hYmRxitMoMCjCgAJMVYGRBPuzL_ByINe6VnqOO9pJIip1vgKmY0VTMv0ylICeThs04UrcG3oSL4oA0C_Ju_xjGGC6c2C5uepRBHn6EWvfIgOuATm0qUZDXg_XsWytapSTEhW_-x1cckydVNmTJqvqEHOxu-_gWgNHOzcjDZtHMyKP5-dXX61keEXC9nn__DbNeDwo
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LbxMxELZKEYIL4k2ggJHghKyu115vFgkhXlVKH6dWys34WUVKNiXJCvVP8RuZ8T4QB3LrMZuxtfKMZ75vPZ4h5I0VZcyiUMxIpRhEPM9s5S3LQ25ErGImHF5wPjlVk3P5fVpMd8jv_i4MplX2PjE5ar90-I18H-wSDy8BEXy8_MmwaxServYtNFqzOApXv4CyrT8cfgX9vs3zg29nXyas6yrAnMqqDcBJq9yYc6esz7wx4BO8cFXkzghhFYRXUVgkImUEKsKjKVQoo8-DkAoPLQTMe4PchMCbIdkrpwPB4-UYsVH3o5J5e7UJEx6LSvU3dsZqfw2Bo6gYhEeWAaCRTP4TFVPzgP8j3hT5Du6Rux1kpZ9aG7tPdkL9gNxqm1hePSRN-kA8i81FWL2nc5h-Dt6TYq56m2Nb02Wki1mq-ATzYFXYRVhTQMs0pAIWMJ6a2lPjXIOVK-gibEwSmzna53pThzgfE5uSLT0i59ey7I_Jbr2sw1NCpeNe-UrB2Ci5F8ZXETm-8aVzAEBHhPeLqV1X6Rwbbsx1YjxjpVsFaFCATgrQckTeDWMu2zofW6U_o44GSazRnR4sVxe62_K6sAH4PxeiMpmMxlruuAzc2YzHsQxhRPZ6DevOcaz1XzMfkdfD37Dl8RzH1GHZoAy2BUDkt0UGIhksapmLEXnSWs_wtgLMVAJNfbb9BV6R25Ozk2N9fHh69JzcyZO5SpYXe2R3s2rCC0BjG_sybQFKflz3nvsDwAVC0Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Centrifuger%3A+lossless+compression+of+microbial+genomes+for+efficient+and+accurate+metagenomic+sequence+classification&rft.jtitle=Genome+biology&rft.au=Song%2C+Li&rft.au=Langmead%2C+Ben&rft.date=2024-04-25&rft.issn=1474-760X&rft.volume=25&rft.issue=1+p.106-106&rft.spage=106&rft.epage=106&rft_id=info:doi/10.1186%2Fs13059-024-03244-4&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1474-760X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1474-760X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1474-760X&client=summon