VISTA: A Tool for Fast Taxonomic Assignment of Viral Genome Sequences

The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce Virus Sequence-based Taxonomy Assignment (VISTA), a computational tool that employs a...

Full description

Saved in:
Bibliographic Details
Published inGenomics, proteomics & bioinformatics Vol. 23; no. 1
Main Authors Zhang (张韬), Tao, Liu (刘依云), Yiyun, Guo (郭栩彤), Xutong, Zhang (张欣然), Xinran, Zheng (郑欣畅), Xinchang, Zhang (张陌尘), Mochen, Bao (鲍一明), Yiming
Format Journal Article
LanguageEnglish
Published England Oxford University Press 10.05.2025
Subjects
Online AccessGet full text
ISSN1672-0229
2210-3244
DOI10.1093/gpbjnl/qzae082

Cover

Loading…
More Information
Summary:The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce Virus Sequence-based Taxonomy Assignment (VISTA), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to Pairwise Sequence Comparison (PASC), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA’s scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista.
Bibliography:Current address for Tao Zhang (张韬): School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong Special Administrative Region 999077, China
Current address for Xinchang Zheng (郑欣畅): Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
Tao Zhang (张韬) and Yiyun Liu (刘依云) Equal contribution.
ISSN:1672-0229
2210-3244
DOI:10.1093/gpbjnl/qzae082