Mash Screen: high-throughput sequence containment estimation for genome discovery

The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and pr...

Full description

Saved in:
Bibliographic Details
Published inGenome Biology Vol. 20; no. 1; p. 232
Main Authors Ondov, Brian D, Starrett, Gabriel J, Sappington, Anna, Kostic, Aleksandra, Koren, Sergey, Buck, Christopher B, Phillippy, Adam M
Format Journal Article
LanguageEnglish
Published England BioMed Central 05.11.2019
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1474-760X
1474-7596
1474-760X
DOI:10.1186/s13059-019-1841-x