SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing
The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly...
Saved in:
Published in | Journal of computational biology Vol. 19; no. 5; p. 455 |
---|---|
Main Authors | , , , , , , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
01.05.2012
|
Subjects | |
Online Access | Get more information |
ISSN | 1557-8666 |
DOI | 10.1089/cmb.2012.0021 |
Cover
Abstract | The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software. |
---|---|
AbstractList | The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software. |
Author | Antipov, Dmitry Bankevich, Anton Lesin, Valery M Pham, Son Tesler, Glenn Gurevich, Alexey A Vyahhi, Nikolay Kulikov, Alexander S Pevzner, Pavel A Sirotkin, Alexander V Alekseyev, Max A Dvorkin, Mikhail Pyshkin, Alexey V Nurk, Sergey Prjibelski, Andrey D Nikolenko, Sergey I |
Author_xml | – sequence: 1 givenname: Anton surname: Bankevich fullname: Bankevich, Anton organization: Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia – sequence: 2 givenname: Sergey surname: Nurk fullname: Nurk, Sergey – sequence: 3 givenname: Dmitry surname: Antipov fullname: Antipov, Dmitry – sequence: 4 givenname: Alexey A surname: Gurevich fullname: Gurevich, Alexey A – sequence: 5 givenname: Mikhail surname: Dvorkin fullname: Dvorkin, Mikhail – sequence: 6 givenname: Alexander S surname: Kulikov fullname: Kulikov, Alexander S – sequence: 7 givenname: Valery M surname: Lesin fullname: Lesin, Valery M – sequence: 8 givenname: Sergey I surname: Nikolenko fullname: Nikolenko, Sergey I – sequence: 9 givenname: Son surname: Pham fullname: Pham, Son – sequence: 10 givenname: Andrey D surname: Prjibelski fullname: Prjibelski, Andrey D – sequence: 11 givenname: Alexey V surname: Pyshkin fullname: Pyshkin, Alexey V – sequence: 12 givenname: Alexander V surname: Sirotkin fullname: Sirotkin, Alexander V – sequence: 13 givenname: Nikolay surname: Vyahhi fullname: Vyahhi, Nikolay – sequence: 14 givenname: Glenn surname: Tesler fullname: Tesler, Glenn – sequence: 15 givenname: Max A surname: Alekseyev fullname: Alekseyev, Max A – sequence: 16 givenname: Pavel A surname: Pevzner fullname: Pevzner, Pavel A |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/22506599$$D View this record in MEDLINE/PubMed |
BookMark | eNo1j11LwzAUQIMo7kMffZX8gdab1Jumvo3hVBgo-PE60uRmdqRpbTpk_96B-nTgPBw4M3Yau0iMXQnIBejqxrZ1LkHIHECKEzYViGWmlVITNktpByAKBeU5m0iJoLCqpuzj9WXhKN1xwyN98y3FriVuUqK2DgduwrYbmvGz5SY63oyJm74PjTVj08XEx46nJm4DZZZC4Im-9hTt0VywM29Coss_ztn76v5t-Zitnx-elot1tisUjpn0vtK2VBJrNKLQ1oEvAb1TFgB04ajU6Ax6RCVqVWrv_VHQrdI1-QrlnF3_dvt93ZLb9EPTmuGw-R-UP6caUw8 |
ContentType | Journal Article |
DBID | CGR CUY CVF ECM EIF NPM |
DOI | 10.1089/cmb.2012.0021 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | no_fulltext_linktorsrc |
Discipline | Biology Mathematics |
EISSN | 1557-8666 |
ExternalDocumentID | 22506599 |
Genre | Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
GrantInformation_xml | – fundername: NCRR NIH HHS grantid: 3P41RR024851-02S1 |
GroupedDBID | --- 0R~ 29K 34G 39C 4.4 53G 5GY ABBKN ABEFU ACGFO ADBBV AENEX AFOSN AI. ALMA_UNASSIGNED_HOLDINGS BAWUL BNQNF CAG CGR COF CS3 CUY CVF D-I DIK DU5 EBS ECM EIF EJD F5P IAO IER IGS IHR IM4 ITC MV1 NPM NQHIM O9- P2P R.V RIG RML RMSOB RNS TN5 TR2 UE5 VH1 |
ID | FETCH-LOGICAL-j365t-2ff98c7625b5a138cd0f705fd6c00083de785da5f5561b678fff85de468bef952 |
IngestDate | Thu Apr 03 06:57:10 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 5 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-j365t-2ff98c7625b5a138cd0f705fd6c00083de785da5f5561b678fff85de468bef952 |
PMID | 22506599 |
ParticipantIDs | pubmed_primary_22506599 |
PublicationCentury | 2000 |
PublicationDate | 2012-May |
PublicationDateYYYYMMDD | 2012-05-01 |
PublicationDate_xml | – month: 05 year: 2012 text: 2012-May |
PublicationDecade | 2010 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Journal of computational biology |
PublicationTitleAlternate | J Comput Biol |
PublicationYear | 2012 |
References | 20736338 - Bioinformatics. 2010 Oct 15;26(20):2509-16 18340039 - Genome Res. 2008 May;18(5):810-20 21999285 - J Comput Biol. 2011 Nov;18(11):1625-34 11381035 - Genome Res. 2001 Jun;11(6):1095-9 22081019 - Nat Biotechnol. 2011 Dec;29(12):1120-7 21170043 - Nat Biotechnol. 2011 Jan;29(1):51-7 19589993 - Science. 2009 Jul 10;325(5937):161-5 21187386 - Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8 11504945 - Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53 17446555 - Mol Cell Proteomics. 2007 Jul;6(7):1123-34 21685062 - Bioinformatics. 2011 Jul 1;27(13):i137-41 21364937 - PLoS One. 2011;6(2):e16626 21926975 - Nat Biotechnol. 2011 Oct;29(10):915-21 20019144 - Genome Res. 2010 Feb;20(2):265-72 11473013 - Bioinformatics. 2001;17 Suppl 1:S225-33 18550420 - Curr Opin Microbiol. 2008 Jun;11(3):198-204 19060866 - Nat Biotechnol. 2008 Dec;26(12):1336-8 19251739 - Genome Res. 2009 Jun;19(6):1117-23 19724646 - PLoS One. 2009;4(9):e6864 9521921 - Genome Res. 1998 Mar;8(3):175-85 21543516 - Genome Res. 2011 Jul;21(7):1160-7 19208115 - BMC Bioinformatics. 2009;10 Suppl 1:S16 21399628 - Nature. 2011 Apr 7;472(7341):90-4 20428247 - PLoS One. 2010;5(4):e10314 15342561 - Genome Res. 2004 Sep;14(9):1786-96 19056694 - Genome Res. 2009 Feb;19(2):336-46 21115437 - Bioinformatics. 2011 Feb 1;27(3):295-302 21908640 - Appl Environ Microbiol. 2011 Nov;77(21):7804-14 18349386 - Genome Res. 2008 May;18(5):821-9 17620602 - Proc Natl Acad Sci U S A. 2007 Jul 17;104(29):11889-94 21114842 - Genome Biol. 2010;11(11):R116 21533272 - PLoS One. 2011;6(4):e18565 18083777 - Genome Res. 2008 Feb;18(2):324-30 7497130 - J Comput Biol. 1995 Summer;2(2):291-306 15700962 - Chem Rev. 2005 Feb;105(2):715-38 16741115 - Science. 2006 Jun 2;312(5778):1355-9 |
References_xml | – reference: 21908640 - Appl Environ Microbiol. 2011 Nov;77(21):7804-14 – reference: 15700962 - Chem Rev. 2005 Feb;105(2):715-38 – reference: 17620602 - Proc Natl Acad Sci U S A. 2007 Jul 17;104(29):11889-94 – reference: 19251739 - Genome Res. 2009 Jun;19(6):1117-23 – reference: 7497130 - J Comput Biol. 1995 Summer;2(2):291-306 – reference: 11381035 - Genome Res. 2001 Jun;11(6):1095-9 – reference: 21187386 - Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8 – reference: 22081019 - Nat Biotechnol. 2011 Dec;29(12):1120-7 – reference: 18340039 - Genome Res. 2008 May;18(5):810-20 – reference: 18349386 - Genome Res. 2008 May;18(5):821-9 – reference: 11504945 - Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53 – reference: 19589993 - Science. 2009 Jul 10;325(5937):161-5 – reference: 21115437 - Bioinformatics. 2011 Feb 1;27(3):295-302 – reference: 21533272 - PLoS One. 2011;6(4):e18565 – reference: 20736338 - Bioinformatics. 2010 Oct 15;26(20):2509-16 – reference: 21170043 - Nat Biotechnol. 2011 Jan;29(1):51-7 – reference: 18550420 - Curr Opin Microbiol. 2008 Jun;11(3):198-204 – reference: 11473013 - Bioinformatics. 2001;17 Suppl 1:S225-33 – reference: 21114842 - Genome Biol. 2010;11(11):R116 – reference: 17446555 - Mol Cell Proteomics. 2007 Jul;6(7):1123-34 – reference: 21999285 - J Comput Biol. 2011 Nov;18(11):1625-34 – reference: 19724646 - PLoS One. 2009;4(9):e6864 – reference: 15342561 - Genome Res. 2004 Sep;14(9):1786-96 – reference: 21543516 - Genome Res. 2011 Jul;21(7):1160-7 – reference: 16741115 - Science. 2006 Jun 2;312(5778):1355-9 – reference: 19208115 - BMC Bioinformatics. 2009;10 Suppl 1:S16 – reference: 21364937 - PLoS One. 2011;6(2):e16626 – reference: 19060866 - Nat Biotechnol. 2008 Dec;26(12):1336-8 – reference: 19056694 - Genome Res. 2009 Feb;19(2):336-46 – reference: 18083777 - Genome Res. 2008 Feb;18(2):324-30 – reference: 9521921 - Genome Res. 1998 Mar;8(3):175-85 – reference: 20019144 - Genome Res. 2010 Feb;20(2):265-72 – reference: 21685062 - Bioinformatics. 2011 Jul 1;27(13):i137-41 – reference: 20428247 - PLoS One. 2010;5(4):e10314 – reference: 21399628 - Nature. 2011 Apr 7;472(7341):90-4 – reference: 21926975 - Nat Biotechnol. 2011 Oct;29(10):915-21 |
SSID | ssj0013607 |
Score | 2.602873 |
Snippet | The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal... |
SourceID | pubmed |
SourceType | Index Database |
StartPage | 455 |
SubjectTerms | Algorithms Bacteria - genetics Genome, Bacterial Metagenomics - methods Sequence Analysis, DNA - methods Single-Cell Analysis - methods |
Title | SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing |
URI | https://www.ncbi.nlm.nih.gov/pubmed/22506599 |
Volume | 19 |
hasFullText | |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3BTtwwELUWUCU4oEJLgbbIh95WgWRjO05vK0SLkECVgIobsh27bLXZrERAwA_w28zESTZAK2gvUdbORlbey3gyHr8h5AuLQmOkloE0MQtYGulAMyGCQZZqZnSkTYq7kQ-PxP4pOzjjZ73efSdr6arU2-buj_tK_gdVaANccZfsPyDb3hQa4BzwhSMgDMdXYXz8Y5j5nDaFpcGxHHKR2z74wzbX49u-Gv8q4OP_Im-XCLrr1eh2YqRgbAMM3_frrOpmLnvusZqqAkQTPazlm2ah0AmMf-QrSw2xMnEbZ77y6djHuNGzvR4uGU2L68rm5aNylo38HdWEm_uM7Q0YrWE3NoFJHk0m4Lat7SmHSVD4wiqtwU07xOId68m8Yu8zqx5KFEU1ucZUPBRY9ZuqOwhP8wpiME64Tpy-3PtEZLvpmiNzSYL2_QiDPs1ilAiTWp4VRrLzaBwoJl3_98mHSeWgnLwlyzVOdOhpskJ6drJK3vhao7erZOmwFei9fEd-eup8pYoCcagnDm2IQ1viUCAOBeLQLnFoWdAOceiMOO_J6be9k939oC6xEfyOBS-DgXOpNDAhcs1VFEuThS4JucuEqbzzzCaSZ4o7rKKqwbFxzkGDZUJq61I-WCPzk2Ji1wllA5OkRjllWMy0ipSV8IvFoY2ZCFW4QT74x3M-9Toq582D2_xrz0eyOKPVJ7Lg4MW1n8ELLPVWhdEDpldgEA |
linkProvider | National Library of Medicine |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPAdes%3A+a+new+genome+assembly+algorithm+and+its+applications+to+single-cell+sequencing&rft.jtitle=Journal+of+computational+biology&rft.au=Bankevich%2C+Anton&rft.au=Nurk%2C+Sergey&rft.au=Antipov%2C+Dmitry&rft.au=Gurevich%2C+Alexey+A&rft.date=2012-05-01&rft.eissn=1557-8666&rft.volume=19&rft.issue=5&rft.spage=455&rft_id=info:doi/10.1089%2Fcmb.2012.0021&rft_id=info%3Apmid%2F22506599&rft_id=info%3Apmid%2F22506599&rft.externalDocID=22506599 |