ContScout: sensitive detection and removal of contamination from annotated genomes

Contamination of genomes is an increasingly recognized problem affecting several downstream applications, from comparative evolutionary genomics to metagenomics. Here we introduce ContScout, a precise tool for eliminating foreign sequences from annotated genomes. It achieves high specificity and sen...

Full description

Saved in:
Bibliographic Details
Published inNature communications Vol. 15; no. 1; pp. 936 - 12
Main Authors Bálint, Balázs, Merényi, Zsolt, Hegedüs, Botond, Grigoriev, Igor V., Hou, Zhihao, Földi, Csenge, Nagy, László G.
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 31.01.2024
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Contamination of genomes is an increasingly recognized problem affecting several downstream applications, from comparative evolutionary genomics to metagenomics. Here we introduce ContScout, a precise tool for eliminating foreign sequences from annotated genomes. It achieves high specificity and sensitivity on synthetic benchmark data even when the contaminant is a closely related species, outperforms competing tools, and can distinguish horizontal gene transfer from contamination. A screen of 844 eukaryotic genomes for contamination identified bacteria as the most common source, followed by fungi and plants. Furthermore, we show that contaminants in ancestral genome reconstructions lead to erroneous early origins of genes and inflate gene loss rates, leading to a false notion of complex ancestral genomes. Taken together, we offer here a tool for sensitive removal of foreign proteins, identify and remove contaminants from diverse eukaryotic genomes and evaluate their impact on phylogenomic analyses. It is unclear whether naturally evolved de novo proteins have stable, folded structures. Here, systematic identification and structural modeling of de novo genes, this study reveals that a small subset of these proteins may have well-folded structures, and were likely born with these structures.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-024-45024-5