Classification of hepatitis C virus and human immunodeficiency virus-1 sequences with the branching index

Theoretical Biology & Biophysics, T-10 MS K710, LANL, Los Alamos, NM 87545, USA Correspondence Peter Hraber phraber{at}lanl.gov Classification of viral sequences should be fast, objective, accurate and reproducible. Most methods that classify sequences use either pair-wise distances or phylogene...

Full description

Saved in:
Bibliographic Details
Published inJournal of general virology Vol. 89; no. 9; pp. 2098 - 2107
Main Authors Hraber, Peter, Kuiken, Carla, Waugh, Mark, Geer, Shaun, Bruno, William J, Leitner, Thomas
Format Journal Article
LanguageEnglish
Published Reading Soc General Microbiol 01.09.2008
Society for General Microbiology
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Theoretical Biology & Biophysics, T-10 MS K710, LANL, Los Alamos, NM 87545, USA Correspondence Peter Hraber phraber{at}lanl.gov Classification of viral sequences should be fast, objective, accurate and reproducible. Most methods that classify sequences use either pair-wise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the BI for subtype classification in hepatitis C virus (HCV) and human immunodeficiency virus-1 (HIV-1). Pairs of BI values with known positive- and negative-test results were computed from 10 000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signals that grouped reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1 % agreement with reference subtypes, with equal false-positive and false-negative rates. For HIV-1, a threshold of 0.66 yields 93.5 % agreement. Higher thresholds can be used where lower false-positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not represent any known subtype uniquely. Web-based services for viral subtype classification with the BI are available online. Present address: Department of Sociology, UC Davis, CA 95616, USA.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0022-1317
1465-2099
DOI:10.1099/vir.0.83657-0