A fully Bayesian model to cluster gene-expression profiles

Motivation: With cDNA or oligonucleotide chips, gene-expression levels of essentially all genes in a genome can be simultaneously monitored over a time-course or under different experimental conditions. After proper normalization of the data, genes are often classified into co-expressed classes (clu...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 21; no. suppl-2; pp. ii130 - ii136
Main Authors	Vogl, C., Sanchez-Cabo, F., Stocker, G., Hubbard, S., Wolkenhauer, O., Trajanoski, Z.
Format	Journal Article
Language	English
Published	England Oxford University Press 01.09.2005 Oxford Publishing Limited (England)
Subjects	Algorithms Artificial Intelligence Bayes Theorem Cluster Analysis Computer Simulation Gene Expression Profiling - methods Models, Genetic Multigene Family - physiology Oligonucleotide Array Sequence Analysis - methods Pattern Recognition, Automated - methods
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Motivation: With cDNA or oligonucleotide chips, gene-expression levels of essentially all genes in a genome can be simultaneously monitored over a time-course or under different experimental conditions. After proper normalization of the data, genes are often classified into co-expressed classes (clusters) to identify subgroups of genes that share common regulatory elements, a common function or a common cellular origin. With most methods, e.g. k-means, the number of clusters needs to be specified in advance; results depend strongly on this choice. Even with likelihood-based methods, estimation of this number is difficult. Furthermore, missing values often cause problems and lead to the loss of data. Results: We propose a fully probabilistic Bayesian model to cluster gene-expression profiles. The number of classes does not need to be specified in advance; instead it is adjusted dynamically using a Reversible Jump Markov Chain Monte Carlo sampler. Imputation of missing values is integrated into the model. With simulations, we determined the speed of convergence of the sampler as well as the accuracy of the inferred variables. Results were compared with the widely used k-means algorithm. With our method, biologically related co-expressed genes could be identified in a yeast transcriptome dataset, even when some values were missing. Availability: The code is available at http://genome.tugraz.at/BayesianClustering/ Contact: claus.vogl@vu-wien.ac.at Supplementary information: The supplementary material is available at http://genome.tugraz.at/BayesianClustering/
Bibliography:	local:bti1122 To whom correspondence should be addressed. istex:0979CB171071584A2F0284FBC6BF8C89CA53BE74 ark:/67375/HXZ-7J9N7KZK-8 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/bti1122