Symbiont-screener: A reference-free tool to separate host sequences from symbionts for error-prone long reads

Metagenomic sequencing facilitates large-scale constitutional analysis and functional characterization of complex microbial communities without cultivation. Recent advances in long-read sequencing techniques utilize long-range information to simplify repeat-aware metagenomic assembly puzzles and com...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in Marine Science Vol. 10
Main Authors Xu, Mengyang, Guo, Lidong, Qi, Yanwei, Shi, Chengcheng, Liu, Xiaochuan, Chen, Jianwei, Han, Jinglin, Deng, Li, Liu, Xin, Fan, Guangyi
Format Journal Article
LanguageEnglish
Published Frontiers Media S.A 30.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Metagenomic sequencing facilitates large-scale constitutional analysis and functional characterization of complex microbial communities without cultivation. Recent advances in long-read sequencing techniques utilize long-range information to simplify repeat-aware metagenomic assembly puzzles and complex genome binning tasks. However, it remains methodologically challenging to remove host-derived DNA sequences from the microbial community at the read resolution due to high sequencing error rates and the absence of reference genomes. We here present Symbiont-Screener ( https://github.com/BGI-Qingdao/Symbiont-Screener ), a reference-free approach to identifying high-confidence host’s long reads from symbionts and contaminants and overcoming the low sequencing accuracy according to a trio-based screening model. The remaining host’s sequences are then automatically grouped by unsupervised clustering. When applied to both simulated and real long-read datasets, it maintains higher precision and recall rates of identifying the host’s raw reads compared to other tools and hence promises the high-quality reconstruction of the host genome and associated metagenomes. Furthermore, we leveraged both PacBio HiFi and nanopore long reads to separate the host’s sequences on a real host-microbe system, an algal-bacterial sample, and retrieved an obvious improvement of host assembly in terms of assembly contiguity, completeness, and purity. More importantly, the residual symbiotic microbiomes illustrate improved genomic profiling and assemblies after the screening, which elucidates a solid basis of data for downstream bioinformatic analyses, thus providing a novel perspective on symbiotic research.
ISSN:2296-7745
2296-7745
DOI:10.3389/fmars.2023.1087447