A Phylogenetic Framework to Simulate Synthetic Interspecies RNA-Seq Data

Abstract Interspecies RNA-Seq datasets are increasingly common, and have the potential to answer new questions about the evolution of gene expression. Single-species differential expression analysis is now a well-studied problem that benefits from sound statistical methods. Extensive reviews on biol...

Full description

Saved in:
Bibliographic Details
Published inMolecular biology and evolution Vol. 40; no. 1; pp. 1 - 14
Main Authors Bastide, Paul, Soneson, Charlotte, Stern, David B, Lespinet, Olivier, Gallopin, Mélina
Format Journal Article
LanguageEnglish
Published US Oxford University Press 04.01.2023
Oxford University Press (OUP)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Interspecies RNA-Seq datasets are increasingly common, and have the potential to answer new questions about the evolution of gene expression. Single-species differential expression analysis is now a well-studied problem that benefits from sound statistical methods. Extensive reviews on biological or synthetic datasets have provided the community with a clear picture on the relative performances of the available methods in various settings. However, synthetic dataset simulation tools are still missing in the interspecies gene expression context. In this work, we develop and implement a new simulation framework. This tool builds on both the RNA-Seq and the phylogenetic comparative methods literatures to generate realistic count datasets, while taking into account the phylogenetic relationships between the samples. We illustrate the usefulness of this new framework through a targeted simulation study, that reproduces the features of a recently published dataset, containing gene expression data in adult eye tissue across blind and sighted freshwater crayfish species. Using our simulated datasets, we perform a fair comparison of several approaches used for differential expression analysis. This benchmark reveals some of the strengths and weaknesses of both the classical and phylogenetic approaches for interspecies differential expression analysis, and allows for a reanalysis of the crayfish dataset. The tool has been integrated in the R package compcodeR, freely available on Bioconductor.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0737-4038
1537-1719
1537-1719
DOI:10.1093/molbev/msac269