Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies

High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alterna...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 5; no. 9; p. e12336
Main Authors Jeanmougin, Marine, de Reynies, Aurelien, Marisa, Laetitia, Paccard, Caroline, Nuel, Gregory, Guedj, Mickael
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 03.09.2010
Public Library of Science (PLoS)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
Conceived and designed the experiments: MJ GN MG. Performed the experiments: MJ. Analyzed the data: MJ MG. Wrote the paper: MJ MG. Implemented tools in R: MJ, AdR. Significantly contributed to the paper: AdR, LM, CP, GN.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0012336