hybridSPAdes: an algorithm for hybrid assembly of short and long reads

Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologie...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 32; no. 7; pp. 1009 - 1015
Main Authors Antipov, Dmitry, Korobeynikov, Anton, McLean, Jeffrey S, Pevzner, Pavel A
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.04.2016
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost. We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads. hybridSPAdes is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades d.antipov@spbu.ru supplementary data are available at Bioinformatics online.
AbstractList Motivation: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost.Results: We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads.Availability and implementation: hybridSPAdes is implemented in C++as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades Supplementary information: supplementary data are available at Bioinformatics online.
Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost. We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads. hybridSPAdes is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades d.antipov@spbu.ru supplementary data are available at Bioinformatics online.
Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost.MOTIVATIONRecent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost.We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads.RESULTSWe describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads.hybridSPAdes is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spadesAVAILABILITY AND IMPLEMENTATIONhybridSPAdes is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spadesd.antipov@spbu.ruCONTACTd.antipov@spbu.rusupplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONsupplementary data are available at Bioinformatics online.
Motivation: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost. Results: We describe hybrid SPA des algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybrid SPA des generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads. Availability and implementation: hybrid SPA des is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades Contact: d.antipov@spbu.ru Supplementary information: supplementary data are available at Bioinformatics online.
Author Antipov, Dmitry
Pevzner, Pavel A
McLean, Jeffrey S
Korobeynikov, Anton
Author_xml – sequence: 1
  givenname: Dmitry
  surname: Antipov
  fullname: Antipov, Dmitry
  organization: Center for Algorithmic Biotechnology, Institute for Translational Biomedicine
– sequence: 2
  givenname: Anton
  surname: Korobeynikov
  fullname: Korobeynikov, Anton
  organization: Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, Department of Statistical Modelling, St. Petersburg State University, St. Petersburg, Russia
– sequence: 3
  givenname: Jeffrey S
  surname: McLean
  fullname: McLean, Jeffrey S
  organization: Department of Periodontics, University of Washington, Seattle, WA 98195, USA
– sequence: 4
  givenname: Pavel A
  surname: Pevzner
  fullname: Pevzner, Pavel A
  organization: Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, Department of Computer Science and Engineering, University of California, San Diego, USA and
BackLink https://www.ncbi.nlm.nih.gov/pubmed/26589280$$D View this record in MEDLINE/PubMed
BookMark eNqNkV1LwzAUhoNM3If-BCWX3tQlTdIkXggynAoDBfW6JE26ZrTNTLrB_r2FzaFXenUOvA8P5_COwaD1rQXgEqMbjCSZauddW_rQqM4Vcaq7bSbECRhhkvGECowHxx2RIRjHuEIIMcSyMzBMMyZkKtAIzKudDs68vd4bG2-haqGqlz64rmpgb4f7GKoYbaPrHfQljJUPXU8aWPt2CYNVJp6D01LV0V4c5gR8zB_eZ0_J4uXxeXa_SFaUiy7JpMFEm9JgpCWxGZWapQhRaZggjBREUZOaFFNJCiUw02VpOCoLSygTAhkyAXd773qjG2sK23ZB1fk6uEaFXe6Vy38nravypd_mVCJORNYLrg-C4D83NnZ542Jh61q11m9ijgUSiDHO-N8oF5xyKrH4D8pSwgVJe_Tq5wfH078rIV8pppOw
ContentType Journal Article
Copyright The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 2015
Copyright_xml – notice: The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
– notice: The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 2015
DBID CGR
CUY
CVF
ECM
EIF
NPM
7QO
7TM
8FD
FR3
P64
7X8
7SC
JQ2
L7M
L~C
L~D
5PM
DOI 10.1093/bioinformatics/btv688
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Biotechnology Research Abstracts
Nucleic Acids Abstracts
Technology Research Database
Engineering Research Database
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
Computer and Information Systems Abstracts
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
PubMed Central (Full Participant titles)
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Engineering Research Database
Biotechnology Research Abstracts
Technology Research Database
Nucleic Acids Abstracts
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
Computer and Information Systems Abstracts
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Engineering Research Database
MEDLINE
Computer and Information Systems Abstracts
MEDLINE - Academic

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1367-4811
1460-2059
EndPage 1015
ExternalDocumentID PMC4907386
26589280
Genre Journal Article
GroupedDBID ---
-E4
-~X
.2P
.DC
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
ABEJV
ABEUO
ABGNP
ABIXL
ABNKS
ABPQP
ABPTD
ABQLI
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
AMNDL
APIBT
APWMN
ARIXL
ASPBG
AVWKF
AXUDD
AYOIW
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C45
CDBKE
CGR
CS3
CUY
CVF
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
ECM
EE~
EIF
EJD
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NPM
NU-
NVLIB
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RNS
ROL
RPM
RUSNO
RW1
RXO
SV3
TEORI
TJP
TLC
TOX
TR2
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~91
~KM
7QO
7TM
8FD
ABJNI
FR3
P64
ROZ
TN5
WH7
7X8
7SC
JQ2
L7M
L~C
L~D
5PM
ID FETCH-LOGICAL-j478t-69d13bdfd10b93e649b520049d58353c3a4d2d21493ca815bffd70fce345880d3
ISSN 1367-4803
1367-4811
IngestDate Thu Aug 21 14:08:47 EDT 2025
Sun Aug 24 03:49:08 EDT 2025
Thu Jul 10 18:22:12 EDT 2025
Fri Jul 11 13:43:13 EDT 2025
Thu Apr 03 07:06:56 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-j478t-69d13bdfd10b93e649b520049d58353c3a4d2d21493ca815bffd70fce345880d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Associate Editor: Inanc Birol
OpenAccessLink https://academic.oup.com/bioinformatics/article-pdf/32/7/1009/19568450/btv688.pdf
PMID 26589280
PQID 1785237832
PQPubID 23462
PageCount 7
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_4907386
proquest_miscellaneous_1808055757
proquest_miscellaneous_1787474918
proquest_miscellaneous_1785237832
pubmed_primary_26589280
PublicationCentury 2000
PublicationDate 2016-04-01
PublicationDateYYYYMMDD 2016-04-01
PublicationDate_xml – month: 04
  year: 2016
  text: 2016-04-01
  day: 01
PublicationDecade 2010
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Bioinformatics
PublicationTitleAlternate Bioinformatics
PublicationYear 2016
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
SSID ssj0005056
ssj0051444
Score 2.6362052
Snippet Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads....
Motivation: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and...
Motivation: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and...
SourceID pubmedcentral
proquest
pubmed
SourceType Open Access Repository
Aggregation Database
Index Database
StartPage 1009
SubjectTerms Algorithms
Assembling
Assembly
Bacteria
Base Sequence
Benchmarking
Bioinformatics
Chromosome Mapping
Cost engineering
Genome
Genomes
Original Papers
Sequence Analysis, DNA
Title hybridSPAdes: an algorithm for hybrid assembly of short and long reads
URI https://www.ncbi.nlm.nih.gov/pubmed/26589280
https://www.proquest.com/docview/1785237832
https://www.proquest.com/docview/1787474918
https://www.proquest.com/docview/1808055757
https://pubmed.ncbi.nlm.nih.gov/PMC4907386
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELdgCIkXxDfdABmJNxTm2M6HeavQqgnKVolU6lsUxw4NtMlE00rdX885TptkqqbBS5TaVq34Lpf7_B1CHzxl2jcE0kkzyh2uksyRLFFOyHRClFYyrNH1v1_451P-debNblSXVPJTen2wruR_qApjQFdTJfsPlN3_KQzAPdAXrkBhuN6JxvOtqbf6MRkqm9kG72qy-FmCvT9f1vmDdsFHUJD1Ui7qYPpqDgp3HTJY1G2GgMarXmQ3Lxsw1aqTCD8sqvyq3NQSaplXbe7wt9LUE22L_LedHZqexK2zb6ytg7UpGGsdrRO9uW5qbSbJRi8an2rjf3D9TtqKFZnMIKeHxIopbce4T4BSDdZ3I2dbP-Z6lx1shaZLiOh8gEFIeAeFuwW-kr1jMAPVxretAftw2heX8Wg6HsfR2Sy6jx5QsCOMIIwuZ20OEDFIQvYHaI7ctkBunmdX7yXYaX_PU7vjIZvkZmptR1eJnqDHjZGBh5ZjnqJ7uniGHtq2o9vnaNTlm884KfCeazBsj-003nENLjNccw2sVNhwDa655gWajs6iL-dO00_D-cWDsHJ8oVwmVaZcIgXTPhfSgG5xoTzQw1nKEq6oomAzszQJXU9mmQpIlmpmypmJYi_RUVEW-jXCIqMkzEyQlSdc01R4VEsB1rEE9RDUrgF6vzubGOSVCUIlhS7Xq9gNQo8ykA_01jVg5XLhhresMYCoHhgbwQC9smceX1mAlpiCWi1oSAYo6FFjv8BgqvdninxeY6tzQUwb3OM77HuCHrXvwxt0VP1Z67egoVbyXc1lfwFtCpc3
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=hybridSPAdes%3A+an+algorithm+for+hybrid+assembly+of+short+and+long+reads&rft.jtitle=Bioinformatics&rft.au=Antipov%2C+Dmitry&rft.au=Korobeynikov%2C+Anton&rft.au=McLean%2C+Jeffrey+S&rft.au=Pevzner%2C+Pavel+A&rft.date=2016-04-01&rft.issn=1367-4803&rft.eissn=1460-2059&rft.volume=32&rft.issue=7&rft.spage=1009&rft.epage=1015&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtv688&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon