WGDTree: a phylogenetic software tool to examine conditional probabilities of retention following whole genome duplication events

Multiple processes impact the probability of retention of individual genes following whole genome duplication (WGD) events. In analyzing two consecutive whole genome duplication events that occurred in the lineage leading to Atlantic salmon, a new phylogenetic statistical analysis was developed to e...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 23; no. 1; pp. 505 - 15
Main Authors Henry, C Nicholas, Piper, Kathryn, Wilson, Amanda E, Miraszek, John L, Probst, Claire S, Rong, Yuying, Liberles, David A
Format Journal Article
LanguageEnglish
Published England BioMed Central Ltd 24.11.2022
BioMed Central
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multiple processes impact the probability of retention of individual genes following whole genome duplication (WGD) events. In analyzing two consecutive whole genome duplication events that occurred in the lineage leading to Atlantic salmon, a new phylogenetic statistical analysis was developed to examine the contingency of retention in one event based upon retention in a previous event. This analysis is intended to evaluate mechanisms of duplicate gene retention and to provide software to generate the test statistic for any genome with pairs of WGDs in its history. Here a software package written in Python, 'WGDTree' for the analysis of duplicate gene retention following whole genome duplication events is presented. Using gene tree-species tree reconciliation to label gene duplicate nodes and differentiate between WGD and SSD duplicates, the tool calculates a statistic based upon the conditional probability of a gene duplicate being retained after a second whole genome duplication dependent upon the retention status after the first event. The package also contains methods for the simulation of gene trees with WGD events. After running simulations, the accuracy of the placement of events has been determined to be high. The conditional probability statistic has been calculated for Phalaenopsis equestris on a monocot species tree with a pair of consecutive WGD events on its lineage, showing the applicability of the method. A new software tool has been created for the analysis of duplicate genes in examination of retention mechanisms. The software tool has been made available on the Python package index and the source code can be found on GitHub here: https://github.com/cnickh/wgdtree .
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-022-05042-w