Recovering genomic clusters of secondary metabolites from lakes: a Metagenomics 2.0 approach

Background: Metagenomic approaches became increasingly popular in the past decades due to decreasing costs of DNA sequencing and bioinformatics development. So far, however, the recovery of long genes coding for secondary metabolism still represents a big challenge. Often, the quality of metagenome...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Cuadrat, Rafael, Ionescu, Danny, Davila, Alberto M R, Hans-Peter Grossart
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 31.08.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Background: Metagenomic approaches became increasingly popular in the past decades due to decreasing costs of DNA sequencing and bioinformatics development. So far, however, the recovery of long genes coding for secondary metabolism still represents a big challenge. Often, the quality of metagenome assemblies is poor, especially in environments with a high microbial diversity where sequence coverage is low and complexity of natural communities high. Recently, new and improved algorithms for binning environmental reads and contigs have been developed to overcome such limitations. Some of these algorithms use a similarity detection approach to classify the obtained reads into taxonomical units and to assemble draft genomes. This approach, however, is quite limited since it can classify exclusively sequences similar to those available (and well classified) in the databases. In this work, we used draft genomes from Lake Stechlin, north-eastern Germany, recovered by MetaBat, an efficient binning tool that integrates empirical probabilistic distances of genome abundance, and tetranucleotide frequency for accurate metagenome binning. These genomes were screened for secondary metabolism genes, such as polyketide synthases (PKS) and non-ribosomal peptide synthases (NRPS), using the Anti-SMASH and NAPDOS workflows. Results: With this approach we were able to identify 243 secondary metabolite clusters from 121 genomes recovered from the lake samples. A total of 18 NRPS, 19 PKS and 3 hybrid PKS/NRPS clusters were found. In addition, it was possible to predict the partial structure of several secondary metabolite clusters allowing for taxonomical classifications and phylogenetic inferences. Conclusions: Our approach revealed a great potential to recover and study secondary metabolites genes from any aquatic ecosystem.
DOI:10.1101/183061