VISTA: A Tool for Fast Taxonomic Assignment of Viral Genome Sequences
The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce Virus Sequence-based Taxonomy Assignment (VISTA), a computational tool that employs a...
Saved in:
Published in | Genomics, proteomics & bioinformatics Vol. 23; no. 1 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
10.05.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 1672-0229 2210-3244 |
DOI | 10.1093/gpbjnl/qzae082 |
Cover
Loading…
Summary: | The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce Virus Sequence-based Taxonomy Assignment (VISTA), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to Pairwise Sequence Comparison (PASC), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA’s scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista. |
---|---|
Bibliography: | Current address for Tao Zhang (张韬): School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region 999077, China Current address for Xinchang Zheng (郑欣畅): Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA Tao Zhang (张韬) and Yiyun Liu (刘依云) Equal contribution. |
ISSN: | 1672-0229 2210-3244 |
DOI: | 10.1093/gpbjnl/qzae082 |