Gene Set Enrichment Analysis in RNA-Seq Data

To date, many gene set analysis (GSA) approaches have been developed for identifying differentially expressed gene sets using microarray data. However, these methods are not directly applicable to RNA-Seq data due to intrinsic difference between two data structures. When testing the differential exp...

Full description

Saved in:
Bibliographic Details
Published inJournal of Data Science Vol. 18; no. 4; pp. 632 - 648
Main Authors Tsai, Chen-An, Li, Pei-Hsun
Format Journal Article
LanguageEnglish
Published 中華資料採礦協會 01.10.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:To date, many gene set analysis (GSA) approaches have been developed for identifying differentially expressed gene sets using microarray data. However, these methods are not directly applicable to RNA-Seq data due to intrinsic difference between two data structures. When testing the differential expression of gene sets, there is a critical assumption that the members in each gene set are sampled independently in most GSA methods. It means that the genes within a gene set don’t share a common biological function. The aim of this paper is twofold. First, we propose a powerful yet simple extension to GSA methods based on the de-correlation (DECO) algorithm that properly remove the correlation bias in the expression of each gene set. We then study the performance of our proposed method compared with other GSA methods through a real RNA-Seq dataset and simulation studies under various scenarios combining with four commonly used normalization methods. Second, we discuss the effect of the complex correlation structure of gene sets on four normalization methods. As a result, we found that our proposed method outperforms the others in terms of Type I error rate and empirical power. A comparative study on a public data showed that gene sets identified by our proposed method have better concordance with biological confirmed pathways than other methods.
ISSN:1683-8602
1680-743X
1683-8602
DOI:10.6339/JDS.202010_18(4).0003