V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput sequencing data

High-throughput sequencing technologies are used increasingly, not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence, and path...

Full description

Saved in:

Bibliographic Details
Published in	bioRxiv
Main Authors	Susana Posada Cespedes, Seifert, David, Topolsky, Ivan, Metzner, Karin J, Beerenwinkel, Niko
Format	Paper
Language	English
Published	Cold Spring Harbor Cold Spring Harbor Laboratory Press 11.06.2020 Cold Spring Harbor Laboratory
Edition	1.1
Subjects	Adaptation Bioinformatics Computer applications Gene mapping Genetic diversity Genomics Haplotypes Markov chains Mathematical models Next-generation sequencing Quality control Statistical analysis Viral infections Virulence
Online Access	Get full text

Cover

Loading…

More Information
Summary:	High-throughput sequencing technologies are used increasingly, not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence, and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations. To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape. V-pipe is freely available at https://github.com/cbg-ethz/V-pipe. Competing Interest Statement The authors have declared no competing interest. Footnotes * https://github.com/cbg-ethz/V-pipe
Bibliography:	SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 Competing Interest Statement: The authors have declared no competing interest.
ISSN:	2692-8205 2692-8205
DOI:	10.1101/2020.06.09.142919