SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing

The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly...

Full description

Saved in:
Bibliographic Details
Published inJournal of computational biology Vol. 19; no. 5; p. 455
Main Authors Bankevich, Anton, Nurk, Sergey, Antipov, Dmitry, Gurevich, Alexey A, Dvorkin, Mikhail, Kulikov, Alexander S, Lesin, Valery M, Nikolenko, Sergey I, Pham, Son, Prjibelski, Andrey D, Pyshkin, Alexey V, Sirotkin, Alexander V, Vyahhi, Nikolay, Tesler, Glenn, Alekseyev, Max A, Pevzner, Pavel A
Format Journal Article
LanguageEnglish
Published United States 01.05.2012
Subjects
Online AccessGet more information
ISSN1557-8666
DOI10.1089/cmb.2012.0021

Cover

Abstract The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.
AbstractList The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.
Author Antipov, Dmitry
Bankevich, Anton
Lesin, Valery M
Pham, Son
Tesler, Glenn
Gurevich, Alexey A
Vyahhi, Nikolay
Kulikov, Alexander S
Pevzner, Pavel A
Sirotkin, Alexander V
Alekseyev, Max A
Dvorkin, Mikhail
Pyshkin, Alexey V
Nurk, Sergey
Prjibelski, Andrey D
Nikolenko, Sergey I
Author_xml – sequence: 1
  givenname: Anton
  surname: Bankevich
  fullname: Bankevich, Anton
  organization: Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia
– sequence: 2
  givenname: Sergey
  surname: Nurk
  fullname: Nurk, Sergey
– sequence: 3
  givenname: Dmitry
  surname: Antipov
  fullname: Antipov, Dmitry
– sequence: 4
  givenname: Alexey A
  surname: Gurevich
  fullname: Gurevich, Alexey A
– sequence: 5
  givenname: Mikhail
  surname: Dvorkin
  fullname: Dvorkin, Mikhail
– sequence: 6
  givenname: Alexander S
  surname: Kulikov
  fullname: Kulikov, Alexander S
– sequence: 7
  givenname: Valery M
  surname: Lesin
  fullname: Lesin, Valery M
– sequence: 8
  givenname: Sergey I
  surname: Nikolenko
  fullname: Nikolenko, Sergey I
– sequence: 9
  givenname: Son
  surname: Pham
  fullname: Pham, Son
– sequence: 10
  givenname: Andrey D
  surname: Prjibelski
  fullname: Prjibelski, Andrey D
– sequence: 11
  givenname: Alexey V
  surname: Pyshkin
  fullname: Pyshkin, Alexey V
– sequence: 12
  givenname: Alexander V
  surname: Sirotkin
  fullname: Sirotkin, Alexander V
– sequence: 13
  givenname: Nikolay
  surname: Vyahhi
  fullname: Vyahhi, Nikolay
– sequence: 14
  givenname: Glenn
  surname: Tesler
  fullname: Tesler, Glenn
– sequence: 15
  givenname: Max A
  surname: Alekseyev
  fullname: Alekseyev, Max A
– sequence: 16
  givenname: Pavel A
  surname: Pevzner
  fullname: Pevzner, Pavel A
BackLink https://www.ncbi.nlm.nih.gov/pubmed/22506599$$D View this record in MEDLINE/PubMed
BookMark eNo1j11LwzAUQIMo7kMffZX8gdab1Jumvo3hVBgo-PE60uRmdqRpbTpk_96B-nTgPBw4M3Yau0iMXQnIBejqxrZ1LkHIHECKEzYViGWmlVITNktpByAKBeU5m0iJoLCqpuzj9WXhKN1xwyN98y3FriVuUqK2DgduwrYbmvGz5SY63oyJm74PjTVj08XEx46nJm4DZZZC4Im-9hTt0VywM29Coss_ztn76v5t-Zitnx-elot1tisUjpn0vtK2VBJrNKLQ1oEvAb1TFgB04ajU6Ax6RCVqVWrv_VHQrdI1-QrlnF3_dvt93ZLb9EPTmuGw-R-UP6caUw8
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
DOI 10.1089/cmb.2012.0021
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
DatabaseTitleList MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
Mathematics
EISSN 1557-8666
ExternalDocumentID 22506599
Genre Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NCRR NIH HHS
  grantid: 3P41RR024851-02S1
GroupedDBID ---
0R~
29K
34G
39C
4.4
53G
5GY
ABBKN
ABEFU
ACGFO
ADBBV
AENEX
AFOSN
AI.
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BNQNF
CAG
CGR
COF
CS3
CUY
CVF
D-I
DIK
DU5
EBS
ECM
EIF
EJD
F5P
IAO
IER
IGS
IHR
IM4
ITC
MV1
NPM
NQHIM
O9-
P2P
R.V
RIG
RML
RMSOB
RNS
TN5
TR2
UE5
VH1
ID FETCH-LOGICAL-j365t-2ff98c7625b5a138cd0f705fd6c00083de785da5f5561b678fff85de468bef952
IngestDate Thu Apr 03 06:57:10 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-j365t-2ff98c7625b5a138cd0f705fd6c00083de785da5f5561b678fff85de468bef952
PMID 22506599
ParticipantIDs pubmed_primary_22506599
PublicationCentury 2000
PublicationDate 2012-May
PublicationDateYYYYMMDD 2012-05-01
PublicationDate_xml – month: 05
  year: 2012
  text: 2012-May
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of computational biology
PublicationTitleAlternate J Comput Biol
PublicationYear 2012
References 20736338 - Bioinformatics. 2010 Oct 15;26(20):2509-16
18340039 - Genome Res. 2008 May;18(5):810-20
21999285 - J Comput Biol. 2011 Nov;18(11):1625-34
11381035 - Genome Res. 2001 Jun;11(6):1095-9
22081019 - Nat Biotechnol. 2011 Dec;29(12):1120-7
21170043 - Nat Biotechnol. 2011 Jan;29(1):51-7
19589993 - Science. 2009 Jul 10;325(5937):161-5
21187386 - Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8
11504945 - Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53
17446555 - Mol Cell Proteomics. 2007 Jul;6(7):1123-34
21685062 - Bioinformatics. 2011 Jul 1;27(13):i137-41
21364937 - PLoS One. 2011;6(2):e16626
21926975 - Nat Biotechnol. 2011 Oct;29(10):915-21
20019144 - Genome Res. 2010 Feb;20(2):265-72
11473013 - Bioinformatics. 2001;17 Suppl 1:S225-33
18550420 - Curr Opin Microbiol. 2008 Jun;11(3):198-204
19060866 - Nat Biotechnol. 2008 Dec;26(12):1336-8
19251739 - Genome Res. 2009 Jun;19(6):1117-23
19724646 - PLoS One. 2009;4(9):e6864
9521921 - Genome Res. 1998 Mar;8(3):175-85
21543516 - Genome Res. 2011 Jul;21(7):1160-7
19208115 - BMC Bioinformatics. 2009;10 Suppl 1:S16
21399628 - Nature. 2011 Apr 7;472(7341):90-4
20428247 - PLoS One. 2010;5(4):e10314
15342561 - Genome Res. 2004 Sep;14(9):1786-96
19056694 - Genome Res. 2009 Feb;19(2):336-46
21115437 - Bioinformatics. 2011 Feb 1;27(3):295-302
21908640 - Appl Environ Microbiol. 2011 Nov;77(21):7804-14
18349386 - Genome Res. 2008 May;18(5):821-9
17620602 - Proc Natl Acad Sci U S A. 2007 Jul 17;104(29):11889-94
21114842 - Genome Biol. 2010;11(11):R116
21533272 - PLoS One. 2011;6(4):e18565
18083777 - Genome Res. 2008 Feb;18(2):324-30
7497130 - J Comput Biol. 1995 Summer;2(2):291-306
15700962 - Chem Rev. 2005 Feb;105(2):715-38
16741115 - Science. 2006 Jun 2;312(5778):1355-9
References_xml – reference: 21908640 - Appl Environ Microbiol. 2011 Nov;77(21):7804-14
– reference: 15700962 - Chem Rev. 2005 Feb;105(2):715-38
– reference: 17620602 - Proc Natl Acad Sci U S A. 2007 Jul 17;104(29):11889-94
– reference: 19251739 - Genome Res. 2009 Jun;19(6):1117-23
– reference: 7497130 - J Comput Biol. 1995 Summer;2(2):291-306
– reference: 11381035 - Genome Res. 2001 Jun;11(6):1095-9
– reference: 21187386 - Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8
– reference: 22081019 - Nat Biotechnol. 2011 Dec;29(12):1120-7
– reference: 18340039 - Genome Res. 2008 May;18(5):810-20
– reference: 18349386 - Genome Res. 2008 May;18(5):821-9
– reference: 11504945 - Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53
– reference: 19589993 - Science. 2009 Jul 10;325(5937):161-5
– reference: 21115437 - Bioinformatics. 2011 Feb 1;27(3):295-302
– reference: 21533272 - PLoS One. 2011;6(4):e18565
– reference: 20736338 - Bioinformatics. 2010 Oct 15;26(20):2509-16
– reference: 21170043 - Nat Biotechnol. 2011 Jan;29(1):51-7
– reference: 18550420 - Curr Opin Microbiol. 2008 Jun;11(3):198-204
– reference: 11473013 - Bioinformatics. 2001;17 Suppl 1:S225-33
– reference: 21114842 - Genome Biol. 2010;11(11):R116
– reference: 17446555 - Mol Cell Proteomics. 2007 Jul;6(7):1123-34
– reference: 21999285 - J Comput Biol. 2011 Nov;18(11):1625-34
– reference: 19724646 - PLoS One. 2009;4(9):e6864
– reference: 15342561 - Genome Res. 2004 Sep;14(9):1786-96
– reference: 21543516 - Genome Res. 2011 Jul;21(7):1160-7
– reference: 16741115 - Science. 2006 Jun 2;312(5778):1355-9
– reference: 19208115 - BMC Bioinformatics. 2009;10 Suppl 1:S16
– reference: 21364937 - PLoS One. 2011;6(2):e16626
– reference: 19060866 - Nat Biotechnol. 2008 Dec;26(12):1336-8
– reference: 19056694 - Genome Res. 2009 Feb;19(2):336-46
– reference: 18083777 - Genome Res. 2008 Feb;18(2):324-30
– reference: 9521921 - Genome Res. 1998 Mar;8(3):175-85
– reference: 20019144 - Genome Res. 2010 Feb;20(2):265-72
– reference: 21685062 - Bioinformatics. 2011 Jul 1;27(13):i137-41
– reference: 20428247 - PLoS One. 2010;5(4):e10314
– reference: 21399628 - Nature. 2011 Apr 7;472(7341):90-4
– reference: 21926975 - Nat Biotechnol. 2011 Oct;29(10):915-21
SSID ssj0013607
Score 2.602873
Snippet The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal...
SourceID pubmed
SourceType Index Database
StartPage 455
SubjectTerms Algorithms
Bacteria - genetics
Genome, Bacterial
Metagenomics - methods
Sequence Analysis, DNA - methods
Single-Cell Analysis - methods
Title SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing
URI https://www.ncbi.nlm.nih.gov/pubmed/22506599
Volume 19
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3BTtwwELUWUCU4oEJLgbbIh95WgWRjO05vK0SLkECVgIobsh27bLXZrERAwA_w28zESTZAK2gvUdbORlbey3gyHr8h5AuLQmOkloE0MQtYGulAMyGCQZZqZnSkTYq7kQ-PxP4pOzjjZ73efSdr6arU2-buj_tK_gdVaANccZfsPyDb3hQa4BzwhSMgDMdXYXz8Y5j5nDaFpcGxHHKR2z74wzbX49u-Gv8q4OP_Im-XCLrr1eh2YqRgbAMM3_frrOpmLnvusZqqAkQTPazlm2ah0AmMf-QrSw2xMnEbZ77y6djHuNGzvR4uGU2L68rm5aNylo38HdWEm_uM7Q0YrWE3NoFJHk0m4Lat7SmHSVD4wiqtwU07xOId68m8Yu8zqx5KFEU1ucZUPBRY9ZuqOwhP8wpiME64Tpy-3PtEZLvpmiNzSYL2_QiDPs1ilAiTWp4VRrLzaBwoJl3_98mHSeWgnLwlyzVOdOhpskJ6drJK3vhao7erZOmwFei9fEd-eup8pYoCcagnDm2IQ1viUCAOBeLQLnFoWdAOceiMOO_J6be9k939oC6xEfyOBS-DgXOpNDAhcs1VFEuThS4JucuEqbzzzCaSZ4o7rKKqwbFxzkGDZUJq61I-WCPzk2Ji1wllA5OkRjllWMy0ipSV8IvFoY2ZCFW4QT74x3M-9Toq582D2_xrz0eyOKPVJ7Lg4MW1n8ELLPVWhdEDpldgEA
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPAdes%3A+a+new+genome+assembly+algorithm+and+its+applications+to+single-cell+sequencing&rft.jtitle=Journal+of+computational+biology&rft.au=Bankevich%2C+Anton&rft.au=Nurk%2C+Sergey&rft.au=Antipov%2C+Dmitry&rft.au=Gurevich%2C+Alexey+A&rft.date=2012-05-01&rft.eissn=1557-8666&rft.volume=19&rft.issue=5&rft.spage=455&rft_id=info:doi/10.1089%2Fcmb.2012.0021&rft_id=info%3Apmid%2F22506599&rft_id=info%3Apmid%2F22506599&rft.externalDocID=22506599