Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters

Motivation: Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 17; no. 5; pp. 405 - 414
Main Authors Lukashin, Alexander V., Fuchs, Rainer
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 01.05.2001
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Motivation: Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. Results: We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies. Availability: The source code of the program implementing the algorithm is available upon request from the authors. Contact: alex_lukashin@biogen.com * To whom correspondence should be addressed.
Bibliography:istex:AB9A22089291B5F83A964745CCD9D4DEB5D4C897
ark:/67375/HXZ-HRCJCQ48-W
PII:1460-2059
local:170405
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/17.5.405