The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies

Intensity values measured by Affymetrix microarrays have to be both normalized, to be able to compare different microarrays by removing non-biological variation, and summarized, generating the final probe set expression values. Various pre-processing techniques, such as dChip, GCRMA, RMA and MAS hav...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 7; no. 1; p. 105
Main Authors	Verhaak, Roel G W, Staal, Frank J T, Valk, Peter J M, Lowenberg, Bob, Reinders, Marcel J T, de Ridder, Dick
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 02.03.2006 BioMed Central BMC
Subjects	Algorithms Biomarkers, Tumor - genetics Biomarkers, Tumor - metabolism Clinical Trials as Topic Cohort Studies Databases, Genetic Diagnosis, Computer-Assisted - methods Gene Expression Profiling - methods Humans Neoplasm Proteins - genetics Neoplasm Proteins - metabolism Neoplasms - diagnosis Neoplasms - genetics Neoplasms - metabolism Oligonucleotide Array Sequence Analysis - methods Reproducibility of Results Sensitivity and Specificity Software Software Validation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Intensity values measured by Affymetrix microarrays have to be both normalized, to be able to compare different microarrays by removing non-biological variation, and summarized, generating the final probe set expression values. Various pre-processing techniques, such as dChip, GCRMA, RMA and MAS have been developed for this purpose. This study assesses the effect of applying different pre-processing methods on the results of analyses of large Affymetrix datasets. By focusing on practical applications of microarray-based research, this study provides insight into the relevance of pre-processing procedures to biology-oriented researchers. Using two publicly available datasets, i.e., gene-expression data of 285 patients with Acute Myeloid Leukemia (AML, Affymetrix HG-U133A GeneChip) and 42 samples of tumor tissue of the embryonal central nervous system (CNS, Affymetrix HuGeneFL GeneChip), we tested the effect of the four pre-processing strategies mentioned above, on (1) expression level measurements, (2) detection of differential expression, (3) cluster analysis and (4) classification of samples. In most cases, the effect of pre-processing is relatively small compared to other choices made in an analysis for the AML dataset, but has a more profound effect on the outcome of the CNS dataset. Analyses on individual probe sets, such as testing for differential expression, are affected most; supervised, multivariate analyses such as classification are far less sensitive to pre-processing. Using two experimental datasets, we show that the choice of pre-processing method is of relatively minor influence on the final analysis outcome of large microarray studies whereas it can have important effects on the results of a smaller study. The data source (platform, tissue homogeneity, RNA quality) is potentially of bigger importance than the choice of pre-processing method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1
ISSN:	1471-2105 1471-2105
DOI:	10.1186/1471-2105-7-105