Gene Set Enrichment Analysis in RNA-Seq Data
To date, many gene set analysis (GSA) approaches have been developed for identifying differentially expressed gene sets using microarray data. However, these methods are not directly applicable to RNA-Seq data due to intrinsic difference between two data structures. When testing the differential exp...
Saved in:
Published in | Journal of Data Science Vol. 18; no. 4; pp. 632 - 648 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
中華資料採礦協會
01.10.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | To date, many gene set analysis (GSA) approaches have been developed for identifying differentially expressed gene sets using microarray data. However, these methods are not directly applicable to RNA-Seq data due to intrinsic difference between two data structures. When testing the differential expression of gene sets, there is a critical assumption that the members in each gene set are sampled independently in most GSA methods. It means that the genes within a gene set don’t share a common biological function. The aim of this paper is twofold. First, we propose a powerful yet simple extension to GSA methods based on the de-correlation (DECO) algorithm that properly remove the correlation bias in the expression of each gene set. We then study the performance of our proposed method compared with other GSA methods through a real RNA-Seq dataset and simulation studies under various scenarios combining with four commonly used normalization methods. Second, we discuss the effect of the complex correlation structure of gene sets on four normalization methods. As a result, we found that our proposed method outperforms the others in terms of Type I error rate and empirical power. A comparative study on a public data showed that gene sets identified by our proposed method have better concordance with biological confirmed pathways than other methods. |
---|---|
ISSN: | 1683-8602 1680-743X 1683-8602 |
DOI: | 10.6339/JDS.202010_18(4).0003 |