Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination
Although it is assumed that contamination in bacterial whole-genome sequencing causes errors, the influences of contamination on clustering analyses, such as single-nucleotide polymorphism discovery, phylogenetics, and multi-locus sequencing typing, have not been quantified. By developing and analyz...
Saved in:
Published in | Genome Biology Vol. 20; no. 1; p. 286 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
England
BioMed Central
18.12.2019
BMC |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Although it is assumed that contamination in bacterial whole-genome sequencing causes errors, the influences of contamination on clustering analyses, such as single-nucleotide polymorphism discovery, phylogenetics, and multi-locus sequencing typing, have not been quantified. By developing and analyzing 720
Listeria monocytogenes
,
Salmonella enterica
, and
Escherichia coli
short-read datasets, we demonstrate that within-species contamination causes errors that confound clustering analyses, while between-species contamination generally does not. Contaminant reads mapping to references or becoming incorporated into chimeric sequences during assembly are the sources of those errors. Contamination sufficient to influence clustering analyses is present in public sequence databases. |
---|---|
Bibliography: | SourceType-Scholarly Journals-1 content type line 14 ObjectType-Report-1 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1474-760X 1474-7596 1474-760X |
DOI: | 10.1186/s13059-019-1914-x |