HIV-specific probabilistic models of protein evolution

Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparat...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 2; no. 6; p. e503
Main Authors	Nickle, David C, Heath, Laura, Jensen, Mark A, Gilbert, Peter B, Mullins, James I, Kosakovsky Pond, Sergei L
Format	Journal Article
Language	English
Published	United States Public Library of Science 06.06.2007 Public Library of Science (PLoS)
Subjects	Acids Alignment Amino acid sequence Amino Acid Substitution Amino acids Bioinformatics Comparative analysis Deoxyribonucleic acid DNA Empirical analysis Empirical models Evolution Evolution (Biology) Evolution, Molecular Evolutionary Biology/Bioinformatics Evolutionary Biology/Microbial Evolution and Genomics Genes Genetic algorithms Genetics and Genomics/Microbial Evolution and Genomics Genomes Genomics Hepatitis Hepatitis C HIV Human immunodeficiency virus Human Immunodeficiency Virus Proteins - chemistry Human Immunodeficiency Virus Proteins - genetics Human Immunodeficiency Virus Proteins - metabolism Humans Hypotheses Hypothesis testing Influenza A Mathematical models Medicine Methods Mitochondrial DNA Models, Statistical Molecular Biology/Molecular Evolution Mutation Nucleotide sequence Phylogenetics Phylogeny Probabilistic models Proteins Sequence Analysis, Protein Similarity Vaccines Virology/Immunodeficiency Viruses Virology/Virus Evolution and Symbiosis Viruses United States > US
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1-the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Conceived and designed the experiments: DN JM SK. Performed the experiments: DN SK. Analyzed the data: DN PG SK. Contributed reagents/materials/analysis tools: DN SK MJ LH. Wrote the paper: DN JM PG SK. Current address: Department of HABE (Epidemiology), College of Public Health, and Department of Genetics in the Franklin College of Arts and Sciences, University of Georgia, Athens, Georgia, United States of America
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0000503