MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families

Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based o...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 5; no. 11; p. e13999
Main Authors	Ropelewski, Alexander J, Nicholas, Hugh B, Gonzalez Mendez, Ricardo R
Format	Journal Article
Language	English
Published	United States Public Library of Science 15.11.2010 Public Library of Science (PLoS)
Subjects	Algorithms Animals Biochemistry/Bioinformatics Biochemistry/Molecular Evolution Bioinformatics Cladistic analysis Computational Biology - methods Computational Biology/Macromolecular Sequence Analysis Computer memory Computer Science/Applications Data collection Datasets Dehydrogenases Epidemiology Evolution Evolutionary Biology/Bioinformatics Gene sequencing Genomes Humans Identification Information theory International conferences Mathematical analysis Parallel processing Phylogenetics Phylogeny Protein families Proteins Proteins - classification Proteins - genetics Reproducibility of Results Resampling Routines Sequence Alignment - methods Simulation Software United States > US Pittsburgh Pennsylvania Pennsylvania
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses. Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets. Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to more moderately sized protein datasets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Conceived and designed the experiments: RRGM. Performed the experiments: AJR. Analyzed the data: AJR HBNJ RRGM. Contributed reagents/materials/analysis tools: AJR RRGM. Wrote the paper: AJR HBNJ RRGM.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0013999