Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures

The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with...

Full description

Saved in:
Bibliographic Details
Published inPhilosophical transactions of the Royal Society of London. Series B. Biological sciences Vol. 363; no. 1512; pp. 3977 - 3984
Main Authors Stamatakis, Alexandros, Ott, Michael
Format Journal Article
LanguageEnglish
Published London The Royal Society 27.12.2008
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on 'gappy' multi-gene alignments. By 'gappy' we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in RAxML indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.
Bibliography:istex:A218241A4FAB42899867EA54ACF64B65664C5F71
ArticleID:rstb20080163
ark:/67375/V84-KNM8N9X3-8
href:3977.pdf
Discussion Meeting Issue 'Statistical and computational challenges in molecular phylogenetics and evolution' organized by Ziheng Yang and Nick Goldman
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0962-8436
1471-2970
DOI:10.1098/rstb.2008.0163