Assembling single-cell genomes and mini-metagenomes from chimeric MDA products

Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly nonuniform read coverage and (ii) a greatly elevated nu...

Full description

Saved in:
Bibliographic Details
Published inJournal of computational biology Vol. 20; no. 10; p. 714
Main Authors Nurk, Sergey, Bankevich, Anton, Antipov, Dmitry, Gurevich, Alexey A, Korobeynikov, Anton, Lapidus, Alla, Prjibelski, Andrey D, Pyshkin, Alexey, Sirotkin, Alexander, Sirotkin, Yakov, Stepanauskas, Ramunas, Clingenpeel, Scott R, Woyke, Tanja, McLean, Jeffrey S, Lasken, Roger, Tesler, Glenn, Alekseyev, Max A, Pevzner, Pavel A
Format Journal Article
LanguageEnglish
Published United States 01.10.2013
Subjects
Online AccessGet more information
ISSN1557-8666
DOI10.1089/cmb.2013.0084

Cover

Abstract Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly nonuniform read coverage and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing "microbial dark matter" that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. On single-cell bacterial datasets, SPAdes improves on the recently developed E+V-SC and IDBA-UD assemblers specifically designed for single-cell sequencing. For standard (cultivated monostrain) datasets, SPAdes also improves on A5, ABySS, CLC, EULER-SR, Ray, SOAPdenovo, and Velvet. Thus, recently developed single-cell assemblers not only enable single-cell sequencing, but also improve on conventional assemblers on their own turf. SPAdes is available for free online download under a GPLv2 license.
AbstractList Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated bacteria. However, single-cell assembly projects are challenging due to (i) the highly nonuniform read coverage and (ii) a greatly elevated number of chimeric reads and read pairs. While recently developed single-cell assemblers have addressed the former challenge, methods for assembling highly chimeric reads remain poorly explored. We present algorithms for identifying chimeric edges and resolving complex bulges in de Bruijn graphs, which significantly improve single-cell assemblies. We further describe applications of the single-cell assembler SPAdes to a new approach for capturing and sequencing "microbial dark matter" that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. On single-cell bacterial datasets, SPAdes improves on the recently developed E+V-SC and IDBA-UD assemblers specifically designed for single-cell sequencing. For standard (cultivated monostrain) datasets, SPAdes also improves on A5, ABySS, CLC, EULER-SR, Ray, SOAPdenovo, and Velvet. Thus, recently developed single-cell assemblers not only enable single-cell sequencing, but also improve on conventional assemblers on their own turf. SPAdes is available for free online download under a GPLv2 license.
Author Antipov, Dmitry
Woyke, Tanja
Lapidus, Alla
Bankevich, Anton
Sirotkin, Yakov
Lasken, Roger
Sirotkin, Alexander
Pyshkin, Alexey
Korobeynikov, Anton
Tesler, Glenn
Gurevich, Alexey A
Pevzner, Pavel A
Clingenpeel, Scott R
McLean, Jeffrey S
Alekseyev, Max A
Stepanauskas, Ramunas
Nurk, Sergey
Prjibelski, Andrey D
Author_xml – sequence: 1
  givenname: Sergey
  surname: Nurk
  fullname: Nurk, Sergey
  organization: 1 Algorithmic Biology Laboratory, St. Petersburg Academic University , Russian Academy of Sciences, St. Petersburg, Russia
– sequence: 2
  givenname: Anton
  surname: Bankevich
  fullname: Bankevich, Anton
– sequence: 3
  givenname: Dmitry
  surname: Antipov
  fullname: Antipov, Dmitry
– sequence: 4
  givenname: Alexey A
  surname: Gurevich
  fullname: Gurevich, Alexey A
– sequence: 5
  givenname: Anton
  surname: Korobeynikov
  fullname: Korobeynikov, Anton
– sequence: 6
  givenname: Alla
  surname: Lapidus
  fullname: Lapidus, Alla
– sequence: 7
  givenname: Andrey D
  surname: Prjibelski
  fullname: Prjibelski, Andrey D
– sequence: 8
  givenname: Alexey
  surname: Pyshkin
  fullname: Pyshkin, Alexey
– sequence: 9
  givenname: Alexander
  surname: Sirotkin
  fullname: Sirotkin, Alexander
– sequence: 10
  givenname: Yakov
  surname: Sirotkin
  fullname: Sirotkin, Yakov
– sequence: 11
  givenname: Ramunas
  surname: Stepanauskas
  fullname: Stepanauskas, Ramunas
– sequence: 12
  givenname: Scott R
  surname: Clingenpeel
  fullname: Clingenpeel, Scott R
– sequence: 13
  givenname: Tanja
  surname: Woyke
  fullname: Woyke, Tanja
– sequence: 14
  givenname: Jeffrey S
  surname: McLean
  fullname: McLean, Jeffrey S
– sequence: 15
  givenname: Roger
  surname: Lasken
  fullname: Lasken, Roger
– sequence: 16
  givenname: Glenn
  surname: Tesler
  fullname: Tesler, Glenn
– sequence: 17
  givenname: Max A
  surname: Alekseyev
  fullname: Alekseyev, Max A
– sequence: 18
  givenname: Pavel A
  surname: Pevzner
  fullname: Pevzner, Pavel A
BackLink https://www.ncbi.nlm.nih.gov/pubmed/24093227$$D View this record in MEDLINE/PubMed
BookMark eNo1j0tLxDAYRYMozkOXbiV_IDX5mjTJsoxPGHWj6yHNY4w0aWk6C_-9Azqbe-FcOHBX6DwP2SN0w2jFqNJ3NnUVUFZXlCp-hpZMCElU0zQLtCrlmx6nhspLtABOdQ0gl-itLcWnro95j8sxek-s73u893lIvmCTHU4xR5L8bE4wTEPC9ismP0WLX-9bPE6DO9i5XKGLYPrir_97jT4fHz42z2T7_vSyabdkBNAzCdqCFkEJzpmRnDrNG6FABiFYoCA7JqjoOFNM1sC8bsBIChw0d5Y7LmCNbv-846FL3u3GKSYz_exOx-AXbM5OAg
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
DOI 10.1089/cmb.2013.0084
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
DatabaseTitleList MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
Mathematics
EISSN 1557-8666
ExternalDocumentID 24093227
Genre Research Support, U.S. Gov't, Non-P.H.S
Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: 1R01GM095373
– fundername: NCRR NIH HHS
  grantid: 3P41RR024851-02S1
– fundername: NHGRI NIH HHS
  grantid: 2R01HG003647
– fundername: NIGMS NIH HHS
  grantid: R01 GM095373
GroupedDBID ---
0R~
29K
34G
39C
4.4
53G
5GY
ABBKN
ABEFU
ACGFO
ADBBV
AENEX
AFOSN
AI.
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BNQNF
CAG
CGR
COF
CS3
CUY
CVF
D-I
DIK
DU5
EBS
ECM
EIF
EJD
F5P
IAO
IER
IGS
IHR
IM4
ITC
MV1
NPM
NQHIM
O9-
P2P
R.V
RIG
RML
RMSOB
RNS
TN5
TR2
UE5
VH1
ID FETCH-LOGICAL-p229t-f9c295f85441a740d9465827f551f027b1505b41817321e962a7024294dc4d452
IngestDate Thu Apr 03 06:56:45 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 10
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p229t-f9c295f85441a740d9465827f551f027b1505b41817321e962a7024294dc4d452
PMID 24093227
ParticipantIDs pubmed_primary_24093227
PublicationCentury 2000
PublicationDate 2013-10-01
PublicationDateYYYYMMDD 2013-10-01
PublicationDate_xml – month: 10
  year: 2013
  text: 2013-10-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of computational biology
PublicationTitleAlternate J Comput Biol
PublicationYear 2013
References 20958248 - J Comput Biol. 2010 Nov;17(11):1519-33
22890147 - Nat Rev Microbiol. 2012 Sep;10(9):631-40
22719823 - PLoS One. 2012;7(6):e32118
17430586 - BMC Biotechnol. 2007;7:19
19390573 - PLoS One. 2009;4(4):e5299
22962446 - Bioinformatics. 2012 Sep 15;28(18):i311-i317
20489017 - Science. 2010 May 21;328(5981):994-9
23028432 - PLoS One. 2012;7(9):e42304
16304596 - Nat Rev Genet. 2005 Nov;6(11):805-14
23422339 - Bioinformatics. 2013 Apr 15;29(8):1072-5
23754396 - Proc Natl Acad Sci U S A. 2013 Jun 25;110(26):E2390-9
22495754 - Bioinformatics. 2012 Jun 1;28(11):1420-8
17923430 - Curr Opin Microbiol. 2007 Oct;10(5):510-6
22506599 - J Comput Biol. 2012 May;19(5):455-77
22028825 - PLoS One. 2011;6(10):e26161
21926975 - Nat Biotechnol. 2011 Oct;29(10):915-21
20019144 - Genome Res. 2010 Feb;20(2):265-72
21304637 - Stand Genomic Sci. 2009 Jul 20;1(1):54-62
23026140 - Curr Opin Microbiol. 2012 Oct;15(5):613-20
19251739 - Genome Res. 2009 Jun;19(6):1117-23
21304689 - Stand Genomic Sci. 2010 Jul 29;3(1):26-36
22803627 - J Comput Biol. 2013 Apr;20(4):359-71
22699609 - Nature. 2012 Jun 14;486(7402):207-14
22068540 - Nat Biotechnol. 2011 Nov;29(11):987-91
14527284 - Annu Rev Microbiol. 2003;57:369-94
23525359 - Genome Res. 2013 May;23(5):855-66
23564253 - Genome Res. 2013 May;23(5):867-77
22719826 - PLoS One. 2012;7(6):e35294
19056694 - Genome Res. 2009 Feb;19(2):336-46
12917642 - Nature. 2003 Aug 28;424(6952):1042-7
18349386 - Genome Res. 2008 May;18(5):821-9
9278503 - Science. 1997 Sep 5;277(5331):1453-62
References_xml – reference: 20958248 - J Comput Biol. 2010 Nov;17(11):1519-33
– reference: 16304596 - Nat Rev Genet. 2005 Nov;6(11):805-14
– reference: 19251739 - Genome Res. 2009 Jun;19(6):1117-23
– reference: 14527284 - Annu Rev Microbiol. 2003;57:369-94
– reference: 22506599 - J Comput Biol. 2012 May;19(5):455-77
– reference: 23754396 - Proc Natl Acad Sci U S A. 2013 Jun 25;110(26):E2390-9
– reference: 23564253 - Genome Res. 2013 May;23(5):867-77
– reference: 18349386 - Genome Res. 2008 May;18(5):821-9
– reference: 22962446 - Bioinformatics. 2012 Sep 15;28(18):i311-i317
– reference: 21304689 - Stand Genomic Sci. 2010 Jul 29;3(1):26-36
– reference: 23422339 - Bioinformatics. 2013 Apr 15;29(8):1072-5
– reference: 22890147 - Nat Rev Microbiol. 2012 Sep;10(9):631-40
– reference: 21304637 - Stand Genomic Sci. 2009 Jul 20;1(1):54-62
– reference: 22719823 - PLoS One. 2012;7(6):e32118
– reference: 17430586 - BMC Biotechnol. 2007;7:19
– reference: 22028825 - PLoS One. 2011;6(10):e26161
– reference: 19056694 - Genome Res. 2009 Feb;19(2):336-46
– reference: 9278503 - Science. 1997 Sep 5;277(5331):1453-62
– reference: 23028432 - PLoS One. 2012;7(9):e42304
– reference: 22495754 - Bioinformatics. 2012 Jun 1;28(11):1420-8
– reference: 17923430 - Curr Opin Microbiol. 2007 Oct;10(5):510-6
– reference: 22719826 - PLoS One. 2012;7(6):e35294
– reference: 20019144 - Genome Res. 2010 Feb;20(2):265-72
– reference: 20489017 - Science. 2010 May 21;328(5981):994-9
– reference: 23026140 - Curr Opin Microbiol. 2012 Oct;15(5):613-20
– reference: 23525359 - Genome Res. 2013 May;23(5):855-66
– reference: 22068540 - Nat Biotechnol. 2011 Nov;29(11):987-91
– reference: 22699609 - Nature. 2012 Jun 14;486(7402):207-14
– reference: 19390573 - PLoS One. 2009;4(4):e5299
– reference: 21926975 - Nat Biotechnol. 2011 Oct;29(10):915-21
– reference: 22803627 - J Comput Biol. 2013 Apr;20(4):359-71
– reference: 12917642 - Nature. 2003 Aug 28;424(6952):1042-7
SSID ssj0013607
Score 2.596643
Snippet Recent advances in single-cell genomics provide an alternative to largely gene-centric metagenomics studies, enabling whole-genome sequencing of uncultivated...
SourceID pubmed
SourceType Index Database
StartPage 714
SubjectTerms Algorithms
Base Composition
Computational Biology
Contig Mapping - methods
DNA, Bacterial - genetics
DNA, Concatenated - genetics
Escherichia coli - genetics
Gene Library
Genome, Bacterial
High-Throughput Nucleotide Sequencing
Nucleic Acid Amplification Techniques
Pedobacter - genetics
Prochlorococcus - genetics
Sequence Analysis, DNA
Single-Cell Analysis
Title Assembling single-cell genomes and mini-metagenomes from chimeric MDA products
URI https://www.ncbi.nlm.nih.gov/pubmed/24093227
Volume 20
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NT-MwELUKaCU4IGD5_pAPe_WSuk6cHCvYXYTUXgCJG4odW1uJtJUICDjw25mx4yaUZbXLJarspkryXsfjycwbQr4VudSJlgmzAC8T1hqmYNViokhzbsEaqgLjHYNhcnYlzq_j607npV1dUqnv-vmPdSWfQRXGAFeskv0PZGc_CgPwGfCFIyAMx3_CGN_YlsoVlOOW_9YwjMNjV-QJajBhTBylQ1hpqjwMunoS_Xvkc-gHp31M0ULR17sP_FTt-j6EmGEt2jSLId_7VOsLLOJsaiPyMdzLyHeZ6mOX4ibaUI2mkwdn6cpR1eQg_0IN4XDGrXkEU9VvRyS6TW4bLCi1FY1h6Ut8O5VgZnnUplPUMprSl5G-M-ZRilqoulSYgeeEaN98D7CYlg5Z8ErADfUaA3-fndPWDlMLZEFKNOtDjPWEd1BJJGtVVriS4zfXgRrS9blz-xHnl1yukdUaKNr37FgnHTPeIF98i9GnDbIymOny3n0lw4YxtMUYWpODAmPoPGMoMoYGxlBgDA2M2SRXP39cnpyxuqMGm3KeVcxmmmexTbHxXC5FVGQCPFAuLfjNNuJSwfYgVgK8PtnjXZMlPJfoxGWi0KIQMd8ii-PJ2OwQmiXGCNlThc17IrY2NanNu1JrVSjRU2aXbPvHcjP1sik34YHtfTizT5Ydn1w29AFZsvA_NYfg9FXqyGHzCsZLWKk
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Assembling+single-cell+genomes+and+mini-metagenomes+from+chimeric+MDA+products&rft.jtitle=Journal+of+computational+biology&rft.au=Nurk%2C+Sergey&rft.au=Bankevich%2C+Anton&rft.au=Antipov%2C+Dmitry&rft.au=Gurevich%2C+Alexey+A&rft.date=2013-10-01&rft.eissn=1557-8666&rft.volume=20&rft.issue=10&rft.spage=714&rft_id=info:doi/10.1089%2Fcmb.2013.0084&rft_id=info%3Apmid%2F24093227&rft_id=info%3Apmid%2F24093227&rft.externalDocID=24093227