Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis

Full genome data sets are currently being explored on a regular basis to infer phylogenetic trees, but there are often discordances among the trees produced by different genes. An important goal in phylogenomics is to identify which individual gene and species produce the same phylogenetic tree and...

Full description

Saved in:
Bibliographic Details
Published inMolecular biology and evolution Vol. 29; no. 6; pp. 1587 - 1598
Main Authors de Vienne, Damien M, Ollier, Sébastien, Aguileta, Gabriela
Format Journal Article
LanguageEnglish
Published United States Oxford Publishing Limited (England) 01.06.2012
Oxford University Press (OUP)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Full genome data sets are currently being explored on a regular basis to infer phylogenetic trees, but there are often discordances among the trees produced by different genes. An important goal in phylogenomics is to identify which individual gene and species produce the same phylogenetic tree and are thus likely to share the same evolutionary history. On the other hand, it is also essential to identify which genes and species produce discordant topologies and therefore evolve in a different way or represent noise in the data. The latter are outlier genes or species and they can provide a wealth of information on potentially interesting biological processes, such as incomplete lineage sorting, hybridization, and horizontal gene transfers. Here, we propose a new method to explore the genomic tree space and detect outlier genes and species based on multiple co-inertia analysis (MCOA), which efficiently captures and compares the similarities in the phylogenetic topologies produced by individual genes. Our method allows the rapid identification of outlier genes and species by extracting the similarities and discrepancies, in terms of the pairwise distances, between all the species in all the trees, simultaneously. This is achieved by using MCOA, which finds successive decomposition axes from individual ordinations (i.e., derived from distance matrices) that maximize a covariance function. The method is freely available as a set of R functions. The source code and tutorial can be found online at http://phylomcoa.cgenomics.org.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
ISSN:0737-4038
1537-1719
DOI:10.1093/molbev/msr317