Pre-processing Agilent microarray data

Pre-processing methods for two-sample long oligonucleotide arrays, specifically the Agilent technology, have not been extensively studied. The goal of this study is to quantify some of the sources of error that affect measurement of expression using Agilent arrays and to compare Agilent's Featu...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 8; no. 1; p. 142
Main Authors	Zahurak, Marianna, Parmigiani, Giovanni, Yu, Wayne, Scharpf, Robert B, Berman, David, Schaeffer, Edward, Shabbeer, Shabana, Cope, Leslie
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 01.05.2007 BioMed Central BMC
Subjects	Algorithms Animals Cell Line, Tumor Databases, Genetic - classification Databases, Genetic - statistics & numerical data DNA microarrays Dogs Gene Expression Profiling - methods Gene Expression Profiling - statistics & numerical data Humans Mice Oligonucleotide Array Sequence Analysis - methods Oligonucleotide Array Sequence Analysis - statistics & numerical data United States
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Pre-processing methods for two-sample long oligonucleotide arrays, specifically the Agilent technology, have not been extensively studied. The goal of this study is to quantify some of the sources of error that affect measurement of expression using Agilent arrays and to compare Agilent's Feature Extraction software with pre-processing methods that have become the standard for normalization of cDNA arrays. These include log transformation followed by loess normalization with or without background subtraction and often a between array scale normalization procedure. The larger goal is to define best study design and pre-processing practices for Agilent arrays, and we offer some suggestions. Simple loess normalization without background subtraction produced the lowest variability. However, without background subtraction, fold changes were biased towards zero, particularly at low intensities. ROC analysis of a spike-in experiment showed that differentially expressed genes are most reliably detected when background is not subtracted. Loess normalization and no background subtraction yielded an AUC of 99.7% compared with 88.8% for Agilent processed fold changes. All methods performed well when error was taken into account by t- or z-statistics, AUCs > or = 99.8%. A substantial proportion of genes showed dye effects, 43% (99% CI: 39%, 47%). However, these effects were generally small regardless of the pre-processing method. Simple loess normalization without background subtraction resulted in low variance fold changes that more reliably ranked gene expression than the other methods. While t-statistics and other measures that take variation into account, including Agilent's z-statistic, can also be used to reliably select differentially expressed genes, fold changes are a standard measure of differential expression for exploratory work, cross platform comparison, and biological interpretation and can not be entirely replaced. Although dye effects are small for most genes, many array features are affected. Therefore, an experimental design that incorporates dye swaps or a common reference could be valuable.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/1471-2105-8-142