Assemblies of long-read metagenomes suffer from diverse errors

Genomes from metagenomes have revolutionised our understanding of microbial diversity, ecology, and evolution, propelling advances in basic science, biomedicine, and biotechnology. Assembly algorithms that take advantage of increasingly available long-read sequencing technologies bring the recovery...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Trigodet, Florian, Sachdeva, Rohan, Banfield, Jillian F., Eren, A. Murat
Format Paper
LanguageEnglish
Published Cold Spring Harbor Laboratory 24.04.2025
Edition1.1
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Genomes from metagenomes have revolutionised our understanding of microbial diversity, ecology, and evolution, propelling advances in basic science, biomedicine, and biotechnology. Assembly algorithms that take advantage of increasingly available long-read sequencing technologies bring the recovery of complete genomes directly from metagenomes within reach. However, assessing the accuracy of the assembled long reads, especially from complex environments that often include poorly studied organisms, poses remarkable challenges. Here we show that erroneous reporting is pervasive among long-read assemblers and can take many forms, including multi-domain chimeras, prematurely circularized sequences, haplotyping errors, excessive repeats, and phantom sequences. Our study highlights the need for rigorous evaluation of the algorithms while they are in development, and options for users who may opt for more accurate reads than shorter runtimes.
Bibliography:Competing Interest Statement: The authors have declared no competing interest.
ISSN:2692-8205
DOI:10.1101/2025.04.22.649783