PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning

PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the ap...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in genetics Vol. 11; p. 516269
Main Authors Xie, Haiying, Yang, Caiyun, Sun, Yamin, Igarashi, Yasuo, Jin, Tao, Luo, Feng
Format Journal Article
LanguageEnglish
Published Frontiers Media S.A 08.09.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the application and performance of PacBio long reads and Illumina HiSeq short reads in metagenomic analyses, we directly compared various assemblies involving PacBio and Illumina sequencing reads based on two anaerobic digestion microbiome samples from a biogas fermenter. Using a PacBio platform, 1.58 million long reads (19.6 Gb) were produced with an average length of 7,604 bp. Using an Illumina HiSeq platform, 151.2 million read pairs (45.4 Gb) were produced. Hybrid assemblies using PacBio long reads and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length, contig N50 size, and number of large contigs. Interestingly, depth-based hybrid assemblies generated a higher percentage of complete genes (98.86%) compared to those based on HiSeq contigs only (40.29%), because the PacBio reads were long enough to cover many repeating short elements and capture multiple genes in a single read. Additionally, the incorporation of PacBio long reads led to considerable advantages regarding reducing contig numbers and increasing the completeness of the genome reconstruction, which was poorly assembled and binned when using HiSeq data alone. From this comparison of PacBio long reads with Illumina HiSeq short reads related to complex microbiome samples, we conclude that PacBio long reads can produce longer contigs, more complete genes, and better genome binning, thereby offering more information about metagenomic samples.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Edited by: Barbara J. Campbell, Clemson University, United States
Reviewed by: Xiyin Wang, North China University of Science and Technology, China; Wei Xu, Texas A&M University Corpus Christi, United States
These authors have contributed equally to this work
This article was submitted to Genomic Assay Technology, a section of the journal Frontiers in Genetics
ISSN:1664-8021
1664-8021
DOI:10.3389/fgene.2020.516269