Comparison of sequencing methods and data processing pipelines for whole genome sequencing and minority single nucleotide variant (mSNV) analysis during an influenza A/H5N8 outbreak

As high-throughput sequencing technologies are becoming more widely adopted for analysing pathogens in disease outbreaks there needs to be assurance that the different sequencing technologies and approaches to data analysis will yield reliable and comparable results. Conversely, understanding where...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 15; no. 2; p. e0229326
Main Authors	Poen, Marjolein J, Pohlmann, Anne, Amid, Clara, Bestebroer, Theo M, Brookes, Sharon M, Brown, Ian H, Everett, Helen, Schapendonk, Claudia M E, Scheuer, Rachel D, Smits, Saskia L, Beer, Martin, Fouchier, Ron A M, Ellis, Richard J
Format	Journal Article
Language	English
Published	United States Public Library of Science 20.02.2020 Public Library of Science (PLoS)
Subjects	Alignment Animals Biological properties Biological samples Biology and Life Sciences Collaboration Comparative analysis Computer and Information Sciences Conserved sequence Data analysis Data processing Deoxyribonucleic acid Disease Outbreaks - veterinary Diseases DNA DNA sequencing Drug resistance Ducks - virology Epidemics Epidemiology Gene sequencing Genome, Viral Genomes Genomics Germany Hypotheses Influenza Influenza A Influenza A Virus, H5N8 Subtype - classification Influenza A Virus, H5N8 Subtype - genetics Information management Iran Laboratories Metadata Methods Molecular biology Netherlands Next-generation sequencing Nucleotides Orthomyxoviridae Infections - veterinary Orthomyxoviridae Infections - virology Outbreaks Pathogenic microorganisms Pathogens Pipelines Platforms Polymers Polymorphism, Single Nucleotide Research and Analysis Methods RNA, Viral - analysis RNA, Viral - genetics Sequence Analysis, DNA Standardization Studies Technology Virology Viruses Whole genome sequencing Whole Genome Sequencing - methods Iran Netherlands Germany United Kingdom > UK
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As high-throughput sequencing technologies are becoming more widely adopted for analysing pathogens in disease outbreaks there needs to be assurance that the different sequencing technologies and approaches to data analysis will yield reliable and comparable results. Conversely, understanding where agreement cannot be achieved provides insight into the limitations of these approaches and also allows efforts to be focused on areas of the process that need improvement. This manuscript describes the next-generation sequencing of three closely related viruses, each analysed using different sequencing strategies, sequencing instruments and data processing pipelines. In order to determine the comparability of consensus sequences and minority (sub-consensus) single nucleotide variant (mSNV) identification, the biological samples, the sequence data from 3 sequencing platforms and the *.bam quality-trimmed alignment files of raw data of 3 influenza A/H5N8 viruses were shared. This analysis demonstrated that variation in the final result could be attributed to all stages in the process, but the most critical were the well-known homopolymer errors introduced by 454 sequencing, and the alignment processes in the different data processing pipelines which affected the consistency of mSNV detection. However, homopolymer errors aside, there was generally a good agreement between consensus sequences that were obtained for all combinations of sequencing platforms and data processing pipelines. Nevertheless, minority variant analysis will need a different level of careful standardization and awareness about the possible limitations, as shown in this study.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0229326