proovframe: frameshift-correction for long-read (meta)genomics

Long-read sequencing technologies hold big promises for the genomic analysis of complex samples such as microbial communities. Yet, despite improving accuracy, basic gene prediction on long-read data is still often impaired by frameshifts resulting from small indels. Consensus polishing using either...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Hackl, Thomas, Trigodet, Florian, Eren, A Murat, Biller, Steven J, Eppley, John M, Luo, Elaine, Burger, Andrew, Delong, Edward F, Fischer, Matthias G
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 24.08.2021
Cold Spring Harbor Laboratory
Edition1.1
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Long-read sequencing technologies hold big promises for the genomic analysis of complex samples such as microbial communities. Yet, despite improving accuracy, basic gene prediction on long-read data is still often impaired by frameshifts resulting from small indels. Consensus polishing using either complementary short reads or to a lesser extent the long reads themselves can mitigate this effect but requires universally high sequencing depth, which is difficult to achieve in complex samples where the majority of community members are rare. Here we present proovframe, a software implementing an alternative approach to overcome frameshift errors in long-read assemblies and raw long reads. We utilize protein-to-nucleotide alignments against reference databases to pinpoint indels in contigs or reads and correct them by deleting or inserting 1-2 bases, thereby conservatively restoring reading-frame fidelity in aligned regions. Using simulated and real-world benchmark data we show that proovframe performs comparably to short-read-based polishing on assembled data, works well with remote protein homologs, and can even be applied to raw reads directly. Together, our results demonstrate that protein-guided frameshift correction significantly improves the analyzability of long-read data both in combination with and as an alternative to common polishing strategies. Proovframe is available from https://github.com/thackl/proovframe. Competing Interest Statement The authors have declared no competing interest. Footnotes * https://github.com/thackl/proovframe * http://github.com/thackl/proovframe-benchmark * https://doi.org/10.5281/zenodo.5164669
AbstractList Long-read sequencing technologies hold big promises for the genomic analysis of complex samples such as microbial communities. Yet, despite improving accuracy, basic gene prediction on long-read data is still often impaired by frameshifts resulting from small indels. Consensus polishing using either complementary short reads or to a lesser extent the long reads themselves can mitigate this effect but requires universally high sequencing depth, which is difficult to achieve in complex samples where the majority of community members are rare. Here we present proovframe, a software implementing an alternative approach to overcome frameshift errors in long-read assemblies and raw long reads. We utilize protein-to-nucleotide alignments against reference databases to pinpoint indels in contigs or reads and correct them by deleting or inserting 1-2 bases, thereby conservatively restoring reading-frame fidelity in aligned regions. Using simulated and real-world benchmark data we show that proovframe performs comparably to short-read-based polishing on assembled data, works well with remote protein homologs, and can even be applied to raw reads directly. Together, our results demonstrate that protein-guided frameshift correction significantly improves the analyzability of long-read data both in combination with and as an alternative to common polishing strategies. Proovframe is available from https://github.com/thackl/proovframe.
Long-read sequencing technologies hold big promises for the genomic analysis of complex samples such as microbial communities. Yet, despite improving accuracy, basic gene prediction on long-read data is still often impaired by frameshifts resulting from small indels. Consensus polishing using either complementary short reads or to a lesser extent the long reads themselves can mitigate this effect but requires universally high sequencing depth, which is difficult to achieve in complex samples where the majority of community members are rare. Here we present proovframe, a software implementing an alternative approach to overcome frameshift errors in long-read assemblies and raw long reads. We utilize protein-to-nucleotide alignments against reference databases to pinpoint indels in contigs or reads and correct them by deleting or inserting 1-2 bases, thereby conservatively restoring reading-frame fidelity in aligned regions. Using simulated and real-world benchmark data we show that proovframe performs comparably to short-read-based polishing on assembled data, works well with remote protein homologs, and can even be applied to raw reads directly. Together, our results demonstrate that protein-guided frameshift correction significantly improves the analyzability of long-read data both in combination with and as an alternative to common polishing strategies. Proovframe is available from https://github.com/thackl/proovframe. Competing Interest Statement The authors have declared no competing interest. Footnotes * https://github.com/thackl/proovframe * http://github.com/thackl/proovframe-benchmark * https://doi.org/10.5281/zenodo.5164669
Author Burger, Andrew
Trigodet, Florian
Biller, Steven J
Eren, A Murat
Delong, Edward F
Hackl, Thomas
Eppley, John M
Fischer, Matthias G
Luo, Elaine
Author_xml – sequence: 1
  givenname: Thomas
  surname: Hackl
  fullname: Hackl, Thomas
– sequence: 2
  givenname: Florian
  surname: Trigodet
  fullname: Trigodet, Florian
– sequence: 3
  givenname: A
  surname: Eren
  middlename: Murat
  fullname: Eren, A Murat
– sequence: 4
  givenname: Steven
  surname: Biller
  middlename: J
  fullname: Biller, Steven J
– sequence: 5
  givenname: John
  surname: Eppley
  middlename: M
  fullname: Eppley, John M
– sequence: 6
  givenname: Elaine
  surname: Luo
  fullname: Luo, Elaine
– sequence: 7
  givenname: Andrew
  surname: Burger
  fullname: Burger, Andrew
– sequence: 8
  givenname: Edward
  surname: Delong
  middlename: F
  fullname: Delong, Edward F
– sequence: 9
  givenname: Matthias
  surname: Fischer
  middlename: G
  fullname: Fischer, Matthias G
BookMark eNpNjz1PwzAURS0EEqX0B7BFYilDwvN7ju0wIKGKL6kSC8xWkj6XVE1cnLSCf0-hDEznDldX95yJ4y50LMSFhExKkNcIKDOwGVKmckNkj8QIdYGpRciP_-VTMen7FQBgoSUZNRK3mxjCzsey5ZvkF_1744e0DjFyPTShS3yIyTp0yzRyuUimLQ_l1ZK70DZ1fy5OfLnuefLHsXh7uH-dPaXzl8fn2d08rSQom-YECtFY8Fp5uWCvJbAuSjCVtUgMGpQhrjSjyauKjUa5QE-1Nliw8jQW08Nu1YT42ezcJjZtGb_cj7oD65DcQX1fvTxU92YfW-4Htwrb2O3fOcw1FUorUvQNdwBYwA
Cites_doi 10.1038/nmeth.2474
10.2139/ssrn.3817805
10.1101/2021.03.02.433653
10.1007/978-1-61779-361-5_15
10.1038/ismej.2017.101
10.5281/zenodo.5164669
10.1101/2021.03.03.433801
10.1101/2020.11.11.378109
ContentType Paper
Copyright 2021. This article is published under http://creativecommons.org/licenses/by-nc/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2021, Posted by Cold Spring Harbor Laboratory
Copyright_xml – notice: 2021. This article is published under http://creativecommons.org/licenses/by-nc/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2021, Posted by Cold Spring Harbor Laboratory
DBID 8FE
8FH
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
FX.
DOI 10.1101/2021.08.23.457338
DatabaseName ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
Biological Sciences
Biological Science Database
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
bioRxiv
DatabaseTitle Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Biological Science Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
Biological Science Database
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: FX.
  name: bioRxiv
  url: https://www.biorxiv.org/
  sourceTypes: Open Access Repository
– sequence: 2
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 2692-8205
Edition 1.1
ExternalDocumentID 2021.08.23.457338v1
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FH
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
NQS
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PROAC
RHI
FX.
ID FETCH-LOGICAL-b1048-530422780f64f1def610e69a07b8823e060473eb6e275bbe7621d2f3c6729e4f3
IEDL.DBID FX.
ISSN 2692-8205
IngestDate Tue Jan 07 18:56:43 EST 2025
Fri Jul 25 09:18:41 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
License This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at http://creativecommons.org/licenses/by-nc/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-b1048-530422780f64f1def610e69a07b8823e060473eb6e275bbe7621d2f3c6729e4f3
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
Competing Interest Statement: The authors have declared no competing interest.
ORCID 0000-0001-5334-1704
0000-0001-9013-4827
0000-0003-0238-7571
0000-0002-3088-4965
0000-0002-4933-2896
0000-0002-0022-320X
0000-0002-2638-823X
0000-0002-4014-3626
OpenAccessLink https://www.biorxiv.org/content/10.1101/2021.08.23.457338
PQID 2563946434
PQPubID 2050091
PageCount 15
ParticipantIDs biorxiv_primary_2021_08_23_457338
proquest_journals_2563946434
PublicationCentury 2000
PublicationDate 20210824
PublicationDateYYYYMMDD 2021-08-24
PublicationDate_xml – month: 08
  year: 2021
  text: 20210824
  day: 24
PublicationDecade 2020
PublicationPlace Cold Spring Harbor
PublicationPlace_xml – name: Cold Spring Harbor
PublicationTitle bioRxiv
PublicationYear 2021
Publisher Cold Spring Harbor Laboratory Press
Cold Spring Harbor Laboratory
Publisher_xml – name: Cold Spring Harbor Laboratory Press
– name: Cold Spring Harbor Laboratory
References Pollard, Gurdasani, Mentzer, Porter, Sandhu (2021.08.23.457338v1.1) 2018; 27
Shen, Le, Li, Hu (2021.08.23.457338v1.45) 2016; 11
Vaser, Sović, Nagarajan, Šikić (2021.08.23.457338v1.18) 2017; 27
Delmont, Eren (2021.08.23.457338v1.39) 2018; 6
Kolmogorov, Yuan, Lin, Pevzner (2021.08.23.457338v1.20) 2019; 37
Kolmogorov (2021.08.23.457338v1.6) 2020; 17
Haro-Moreno, López-Pérez, Rodríguez-Valera (2021.08.23.457338v1.24) 2020
Beaulaurier (2021.08.23.457338v1.14) 2020; 30
Fu, Wang, Au (2021.08.23.457338v1.2) 2019; 20
Quick (2021.08.23.457338v1.7) 2016; 530
Rooke (2021.08.23.457338v1.8) 2019; 1
Slaby, Hackl, Horn, Bayer, Hentschel (2021.08.23.457338v1.12) 2017
Dohm, Peters, Stralis-Pavese, Himmelbauer (2021.08.23.457338v1.3) 2020; 2
Trigodet (2021.08.23.457338v1.37) 2021
Ruan, Li (2021.08.23.457338v1.5) 2020; 17
Nowoshilow (2021.08.23.457338v1.9) 2018; 554
Hackl, Hedrich, Schultz, Förster (2021.08.23.457338v1.16) 2014; 30
Watson (2021.08.23.457338v1.36) 2021
Chin (2021.08.23.457338v1.15) 2013
Roux, Enault, Hurwitz, Sullivan (2021.08.23.457338v1.47) 2015; 3
Fuhrman (2021.08.23.457338v1.26) 2009; 459
Buchfink, Xie, Huson (2021.08.23.457338v1.23) 2015; 12
Chen, Anantharaman, Shaiber, Eren, Banfield (2021.08.23.457338v1.25) 2020; 30
Palfalvi (2021.08.23.457338v1.10) 2020; 30
McKenzie, Walston, Allen (2021.08.23.457338v1.13) 2020; 112
Steinegger, Mirdita, Söding (2021.08.23.457338v1.46) 2019; 16
Huson (2021.08.23.457338v1.22) 2018; 13
Yang, Chu, Warren, Birol (2021.08.23.457338v1.30) 2017; 6
Hackl (2021.08.23.457338v1.27) 2021
Buchfink, Reuter, Drost (2021.08.23.457338v1.41) 2021; 18
Eren (2021.08.23.457338v1.40) 2021; 6
Hernández-Salmerón, Moreno-Hagelsieb (2021.08.23.457338v1.48) 2020; 21
Hackl, Duponchel, Barenhoff, Weinmann (2021.08.23.457338v1.11) 2020
Liu, Mei, Soltis, Soltis, Barbazuk (2021.08.23.457338v1.4) 2017; 17
Hyatt (2021.08.23.457338v1.31) 2010; 11
Silvestre-Ryan, Holmes (2021.08.23.457338v1.34) 2021; 22
Xiao (2021.08.23.457338v1.17) 2017; 14
Hackl, Ankenbrand (2021.08.23.457338v1.32) 2021
van Dongen, Abreu-Goodger (2021.08.23.457338v1.42) 2012
Bernheim, Sorek (2021.08.23.457338v1.28) 2020; 18
Walker (2021.08.23.457338v1.38) 2014; 9
Arumugam (2021.08.23.457338v1.21) 2019; 7
Li (2021.08.23.457338v1.19) 2018; 34
Vereecke (2021.08.23.457338v1.33) 2020; 21
Hackl (2021.08.23.457338v1.29) 2021
Edgar (2021.08.23.457338v1.43) 2004; 5
Biller (2021.08.23.457338v1.44) 2014; 343
Suzek (2021.08.23.457338v1.35) 2015; 31
References_xml – start-page: 1
  year: 2013
  end-page: 9
  ident: 2021.08.23.457338v1.15
  article-title: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
  publication-title: Nat. Methods
  doi: 10.1038/nmeth.2474
– volume: 17
  start-page: 155
  year: 2020
  end-page: 158
  ident: 2021.08.23.457338v1.5
  article-title: Fast and accurate long-read assembly with wtdbg2
  publication-title: Nat. Methods
– volume: 530
  start-page: 228
  year: 2016
  end-page: 232
  ident: 2021.08.23.457338v1.7
  article-title: Real-time, portable genome sequencing for Ebola surveillance
  publication-title: Nature
– volume: 17
  start-page: 1103
  year: 2020
  end-page: 1110
  ident: 2021.08.23.457338v1.6
  article-title: metaFlye: scalable long-read metagenome assembly using repeat graphs
  publication-title: Nat. Methods
– volume: 554
  start-page: 50
  year: 2018
  end-page: 55
  ident: 2021.08.23.457338v1.9
  article-title: The axolotl genome and the evolution of key tissue formation regulators
  publication-title: Nature
– volume: 20
  start-page: 26
  year: 2019
  ident: 2021.08.23.457338v1.2
  article-title: A comparative evaluation of hybrid error correction methods for error-prone long reads
  publication-title: Genome Biol
– volume: 6
  start-page: e4320
  year: 2018
  ident: 2021.08.23.457338v1.39
  article-title: Linking pangenomes and metagenomes: the Prochlorococcus metapangenome
  publication-title: PeerJ
– volume: 343
  start-page: 183
  year: 2014
  end-page: 186
  ident: 2021.08.23.457338v1.44
  article-title: Bacterial vesicles in marine ecosystems
  publication-title: Science
– volume: 34
  start-page: 3094
  year: 2018
  end-page: 3100
  ident: 2021.08.23.457338v1.19
  article-title: Minimap2: pairwise alignment for nucleotide sequences
  publication-title: Bioinformatics
– year: 2021
  ident: 2021.08.23.457338v1.27
  article-title: Novel Integrative Elements and Genomic Plasticity in Ocean Ecosystems
  publication-title: Cell preprint
  doi: 10.2139/ssrn.3817805
– volume: 18
  start-page: 113
  year: 2020
  end-page: 119
  ident: 2021.08.23.457338v1.28
  article-title: The pan-immune system of bacteria: antiviral defence as a community resource
  publication-title: Nat. Rev. Microbiol
– volume: 30
  start-page: 3004
  year: 2014
  end-page: 3011
  ident: 2021.08.23.457338v1.16
  article-title: proovread: large-scale high-accuracy PacBio correction through iterative short read consensus
  publication-title: Bioinformatics
– volume: 21
  start-page: 517
  year: 2020
  ident: 2021.08.23.457338v1.33
  article-title: High quality genome assemblies of Mycoplasma bovis using a taxon-specific Bonito basecaller for MinION and Flongle long-read nanopore sequencing
  publication-title: BMC Bioinformatics
– volume: 12
  start-page: 59
  year: 2015
  end-page: 60
  ident: 2021.08.23.457338v1.23
  article-title: Fast and sensitive protein alignment using DIAMOND
  publication-title: Nat. Methods
– year: 2021
  ident: 2021.08.23.457338v1.32
  publication-title: gggenomes - A grammar of graphics for comparative genomics
– year: 2021
  ident: 2021.08.23.457338v1.36
  article-title: Adaptive ecological processes and metabolic independence drive microbial colonization and resilience in the human gut
  publication-title: bioRxiv
  doi: 10.1101/2021.03.02.433653
– volume: 7
  start-page: 61
  year: 2019
  ident: 2021.08.23.457338v1.21
  article-title: Annotated bacterial chromosomes from frame-shift-corrected long-read metagenomic data
  publication-title: Microbiome
– volume: 11
  start-page: 119
  year: 2010
  ident: 2021.08.23.457338v1.31
  article-title: Prodigal: prokaryotic gene recognition and translation initiation site identification
  publication-title: BMC Bioinformatics
– volume: 459
  start-page: 193
  year: 2009
  end-page: 199
  ident: 2021.08.23.457338v1.26
  article-title: Microbial community structure and its functional implications
  publication-title: Nature
– volume: 16
  start-page: 603
  year: 2019
  end-page: 606
  ident: 2021.08.23.457338v1.46
  article-title: Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold
  publication-title: Nat. Methods
– volume: 9
  start-page: e112963
  year: 2014
  ident: 2021.08.23.457338v1.38
  article-title: Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement
  publication-title: PLoS One
– start-page: 281
  year: 2012
  end-page: 295
  ident: 2021.08.23.457338v1.42
  publication-title: in Bacterial Molecular Networks: Methods and Protocols (eds
  doi: 10.1007/978-1-61779-361-5_15
– year: 2017
  ident: 2021.08.23.457338v1.12
  article-title: Metagenomic binning of a marine sponge microbiome reveals unity in defense but metabolic specialization
  publication-title: ISME J
  doi: 10.1038/ismej.2017.101
– volume: 27
  start-page: R234
  year: 2018
  end-page: R241
  ident: 2021.08.23.457338v1.1
  article-title: Long reads: their purpose and place
  publication-title: Hum. Mol. Genet
– volume: 30
  start-page: 2312
  year: 2020
  end-page: 2320
  ident: 2021.08.23.457338v1.10
  article-title: Genomes of the Venus Flytrap and Close Relatives Unveil the Roots of Plant Carnivory
  publication-title: Curr. Biol
– volume: 1
  year: 2019
  ident: 2021.08.23.457338v1.8
  article-title: Resolving complex mobile genetic elements with nanopore sequencing
  publication-title: Access Microbiology
– volume: 30
  start-page: 437
  year: 2020
  end-page: 446
  ident: 2021.08.23.457338v1.14
  article-title: Assembly-free single-molecule sequencing recovers complete virus genomes from natural microbial communities
  publication-title: Genome Res
– volume: 18
  start-page: 366
  year: 2021
  end-page: 368
  ident: 2021.08.23.457338v1.41
  article-title: Sensitive protein alignments at tree-of-life scale using DIAMOND
  publication-title: Nat. Methods
– volume: 37
  start-page: 540
  year: 2019
  end-page: 546
  ident: 2021.08.23.457338v1.20
  article-title: Assembly of long, error-prone reads using repeat graphs
  publication-title: Nat. Biotechnol
– volume: 13
  start-page: 6
  year: 2018
  ident: 2021.08.23.457338v1.22
  article-title: MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs
  publication-title: Biol. Direct
– year: 2021
  ident: 2021.08.23.457338v1.29
  publication-title: thackl/proovframe-benchmark: proovframe-benchmark-v3.0
  doi: 10.5281/zenodo.5164669
– year: 2021
  ident: 2021.08.23.457338v1.37
  article-title: High molecular weight DNA extraction strategies for long-read sequencing of complex metagenomes
  publication-title: bioRxiv
  doi: 10.1101/2021.03.03.433801
– year: 2020
  ident: 2021.08.23.457338v1.11
  article-title: Endogenous virophages populate the genomes of a marine heterotrophic flagellate
  publication-title: bioRxiv
– volume: 27
  start-page: 737
  year: 2017
  end-page: 746
  ident: 2021.08.23.457338v1.18
  article-title: Fast and accurate de novo genome assembly from long uncorrected reads
  publication-title: Genome Res
– volume: 6
  start-page: 3
  year: 2021
  end-page: 6
  ident: 2021.08.23.457338v1.40
  article-title: Community-led, integrated, reproducible multi-omics with anvi’o
  publication-title: Nat Microbiol
– volume: 112
  start-page: 3150
  year: 2020
  end-page: 3156
  ident: 2021.08.23.457338v1.13
  article-title: Complete, high-quality genomes from long-read metagenomic sequencing of two wolf lichen thalli reveals enigmatic genome architecture
  publication-title: Genomics
– volume: 5
  start-page: 113
  year: 2004
  ident: 2021.08.23.457338v1.43
  article-title: MUSCLE: a multiple sequence alignment method with reduced time and space complexity
  publication-title: BMC Bioinformatics
– volume: 17
  start-page: 1243
  year: 2017
  end-page: 1256
  ident: 2021.08.23.457338v1.4
  article-title: Detecting alternatively spliced transcript isoforms from single-molecule long-read sequences without a reference genome
  publication-title: Mol. Ecol. Resour
– volume: 3
  start-page: e985
  year: 2015
  ident: 2021.08.23.457338v1.47
  article-title: VirSorter: mining viral signal from microbial genomic data
  publication-title: PeerJ
– volume: 2
  year: 2020
  ident: 2021.08.23.457338v1.3
  article-title: Benchmarking of long-read correction methods
  publication-title: NAR Genomics and Bioinformatics
– volume: 14
  start-page: 1072
  year: 2017
  end-page: 1074
  ident: 2021.08.23.457338v1.17
  article-title: MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads
  publication-title: Nat. Methods
– volume: 30
  start-page: 315
  year: 2020
  end-page: 333
  ident: 2021.08.23.457338v1.25
  article-title: Accurate and complete genomes from metagenomes
  publication-title: Genome Res
– volume: 31
  start-page: 926
  year: 2015
  end-page: 932
  ident: 2021.08.23.457338v1.35
  article-title: UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches
  publication-title: Bioinformatics
– volume: 11
  start-page: e0163962
  year: 2016
  ident: 2021.08.23.457338v1.45
  article-title: SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
  publication-title: PLoS One
– volume: 21
  year: 2020
  ident: 2021.08.23.457338v1.48
  article-title: Progress in quickly finding orthologs as reciprocal best hits: comparing blast, last, diamond and MMseqs2
  publication-title: BMC Genomics
– year: 2020
  ident: 2021.08.23.457338v1.24
  article-title: Long read metagenomics, the next step?
  publication-title: Cold Spring Harbor Laboratory
  doi: 10.1101/2020.11.11.378109
– volume: 6
  start-page: 1
  year: 2017
  end-page: 6
  ident: 2021.08.23.457338v1.30
  article-title: NanoSim: nanopore sequence read simulator based on statistical characterization
  publication-title: Gigascience
– volume: 22
  start-page: 38
  year: 2021
  ident: 2021.08.23.457338v1.34
  article-title: Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing
  publication-title: Genome Biol
SSID ssj0002961374
Score 1.6332521
SecondaryResourceType preprint
Snippet Long-read sequencing technologies hold big promises for the genomic analysis of complex samples such as microbial communities. Yet, despite improving accuracy,...
SourceID biorxiv
proquest
SourceType Open Access Repository
Aggregation Database
SubjectTerms Bioinformatics
Genomic analysis
Proteins
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NS8NAEF20RfDmJ1arRPCgh9Vkv7LxoKC0FMFSxEJvYTeZ1YJtaluL_ntn01QPgqccEhYyu3nz5u3mDSFn0irlJKJfmFigQgmgxkmgscsSrsMsSjJfKD52VacvHgZyUAlus-pY5QoTS6DOi8xr5FeYmnmCY3FxO3mnvmuU312tWmiskzpCsNY1Ur9rdXtPPyoLSzBdlVbMTCX46bNQVlubuBR94V8aeDJ-KbwvoEYSbIfF9HO4-APNZb5pb5F6z0xguk3WYLxDNpYNI792yQ0-XSycP1B1HZSX2evQzWnme2yUfygESEKDt2L8QpEN5sH5CObmwhuxjobZbI_0263n-w6tOiBQi2WSptKLDSzWoVPCRTk4JDugEhPGFpkxB-98E3OwClgsrQVEtihnjmcKOTMIx_dJbVyM4YAEubJKZkZLZiIhE2OMdIaB1pFBEuOgQU6rV08nS5-L1IcnDXXKeLoMT4M0V0FJq6U-S38n5vD_20dk04_oBVkmmqQ2n37AMWb0uT2ppu0bqBKcfA
  priority: 102
  providerName: ProQuest
Title proovframe: frameshift-correction for long-read (meta)genomics
URI https://www.proquest.com/docview/2563946434
https://www.biorxiv.org/content/10.1101/2021.08.23.457338
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA-6IfjmJ37MUcEHfeho89XUBx-UjSE4hjjYW0naiw5cO7Y69L_30lYR9MGnQElaeknufne5_I6QC2GktAK1XxAb8Lnk4GsrwI9sGjMVpGGcOkfxYSSHE34_FdMfpb5cWqWZFcv32bo6x3cJ26h9680dhM5Xrzg3KetxR-WnNkkblxR3e3Iw7X2HV2iMdirizTnmnyMR8TZf-qWHK-My2CHtsV7AcpdsQL5HturqkB_75AZ7F2vrsqeuvapZvcxs6aeuoEZ1HcFDxOm9Fvmzj9Av8y7nUOorx7o6n6WrAzIZ9J_uhn5T7sA36BMpX7jIAo1UYCW3YQYWkQ3IWAeRQRjMwNHcRAyMBBoJYwDVWJhRy1KJABm4ZYeklRc5HBEvk0aKVCtBdchFrLUWVlNQKtSIWCwck_Pm15NFTWqROPEkgUooS2rxHJPOl1CSZl2vEgRILMYZZfzkH684JdvumQvBUt4hrXL5Bmdow0vTJe3b_mj82K1m7RP3upao
linkProvider Cold Spring Harbor Laboratory Press
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwEB7Briq4QQF1eTVIVKIHt4lfSZCgUlvQUmCFKpC4BTsZw0qwWXaX15_iNzLOZsuhEjdOOTgayePxzDdj-xuATWW1doq8X5haZFJLZMYpZLHLU5GEeZTmPlE87uj2mfxzrs6n4HnyFsZfq5z4xMpRF2Xua-TfKTSLlGQJ-aN_y3zXKH-6OmmhMTaLQ3x6oJRtuHPwm9b3C-f7e6e_2qzuKsAspR4JUz6B53ESOi1dVKAjAIE6NWFsCW0K9GwysUCrkcfKWiRvERXciVwTDkXpBMmdhqYUOuQNaP7c65z8_VfV4SmFx4r6meuUXA0PVX2USqbvCw0VYSgX36TnIUwIdNtuOXjs3v8XCqr4tj8HzRPTx8E8TGHvI3wYN6h8WoBd-ru8d_4C13ZQfYZXXTdiue_pUb2ICAj0Btdl75IR-iyCrRscma-e-PWmmw8X4exddLMEjV7Zw08QFNpqlZtEcRNJlRpjlDMckyQyBJoctmCjnnrWH_NqZF49WZhkXGRj9bRgdaKUrN5aw-zVEJbfHv4MM-3T46Ps6KBzuAKzXrovBnO5Co3R4A7XCE2M7Hq9hAFcvLfVvACGhNch
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8MwDLYGCMSNp3hTJJDg0KnNqy0SXBgTMJg4gLRbSVoHJsE2beOxv8UvxOkKFzhw2amHVlHjOPZnx_kMsC-NUlaS9QsSg75QAn1tJfqRzRIeB1mYZC5QvGmqi3tx1ZKtCnx-34VxZZWm3e1_tN-Kc3xXsE3Wd7y5g9DF6gXnJuNV4aj84qpLU1d7uS0LKxs4eqewbXByWaM1PmCsfn53duGXnQV8Q-FH7EsXxLMoDqwSNszREohAleggMoQ4OTpGmYijUcgiaQySxQhzZnmmCIuisJzGnYIZ0mXh2kXUW9WfvA5LyEFGojxA_fOXCWqXU_zlAAqvVl-AmVvdw_4iVLCzBLPjtpSjZTilr7tv1pVtHXvFY_DUtkM_c508insQHkFd77nbefQJc-be4QsO9ZGje31pZ4MVuJ-INFZhutPt4Bp4uTJKZjqWTIdCJlpraTXDOA41QSWL67BXTj3tjdk0UieeNIhTxtOxeNZh61soabmhBikhM56QKnGx8Y8hdmHutlZPry-bjU2Yd69dGpiJLZge9l9xm3DE0OwUC-fBw6Q15QuD79LH
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=proovframe%3A+frameshift-correction+for+long-read+%28meta%29genomics&rft.jtitle=bioRxiv&rft.au=Hackl%2C+Thomas&rft.au=Trigodet%2C+Florian&rft.au=Eren%2C+A.+Murat&rft.au=Biller%2C+Steven+J.&rft.date=2021-08-24&rft.pub=Cold+Spring+Harbor+Laboratory&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2021.08.23.457338&rft.externalDocID=2021.08.23.457338v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon