Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates

Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets public...

Full description

Saved in:
Bibliographic Details
Published inPLoS genetics Vol. 4; no. 6; p. e1000098
Main Authors Gadbury, Gary L., Xiang, Qinfang, Yang, Lin, Barnes, Stephen, Page, Grier P., Allison, David B.
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 01.06.2008
Public Library of Science (PLoS)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
ObjectType-Undefined-3
Conceived and designed the experiments: GG DA. Performed the experiments: QX. Analyzed the data: LY. Wrote the paper: GG SB GP DA.
ISSN:1553-7404
1553-7390
1553-7404
DOI:10.1371/journal.pgen.1000098