Accounting for probe-level noise in principal component analysis of microarray data

Motivation: Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noi...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 21; no. 19; pp. 3748 - 3754
Main Authors	Sanguinetti, Guido, Milo, Marta, Rattray, Magnus, Lawrence, Neil D.
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 01.10.2005 Oxford Publishing Limited (England)
Subjects	Algorithms Biological and medical sciences Data Interpretation, Statistical Fundamental and applied biological sciences. Psychology Gene Expression Profiling - methods General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Models, Genetic Models, Statistical Oligonucleotide Array Sequence Analysis - methods Principal Component Analysis Software Stochastic Processes Gene cluster Data analysis Correlation Probabilistic approach Noise Estimation Information extraction Review Gene Parameter Bioinformatics EM algorithm Principal component analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Motivation: Principal Component Analysis (PCA) is one of the most popular dimensionality reduction techniques for the analysis of high-dimensional datasets. However, in its standard form, it does not take into account any error measures associated with the data points beyond a standard spherical noise. This indiscriminate nature provides one of its main weaknesses when applied to biological data with inherently large variability, such as expression levels measured with microarrays. Methods now exist for extracting credibility intervals from the probe-level analysis of cDNA and oligonucleotide microarray experiments. These credibility intervals are gene and experiment specific, and can be propagated through an appropriate probabilistic downstream analysis. Results: We propose a new model-based approach to PCA that takes into account the variances associated with each gene in each experiment. We develop an efficient EM-algorithm to estimate the parameters of our new model. The model provides significantly better results than standard PCA, while remaining computationally reasonable. We show how the model can be used to ‘denoise’ a microarray dataset leading to improved expression profiles and tighter clustering across profiles. The probabilistic nature of the model means that the correct number of principal components is automatically obtained. Availability: The software used in the paper is available from http://www.bioinf.man.ac.uk/resources/puma. The microarray data are depo-sited in the NCBI database. Contact: neil@dcs.shef.ac.uk
Bibliography:	istex:B82BC3BD97A5BBB77A847799B3E99A1557CBF06F ark:/67375/HXZ-TXVZ3N7Z-K local:bti617 To whom correspondence should be addressed. ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Undefined-1 ObjectType-Feature-3
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/bti617