Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites

The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are key steps toward understanding transcription regulation. In addition to effective laboratory assays, various computational approaches for the detection of TFBS in promoter regions of coexpressed genes ha...

Full description

Saved in:

Bibliographic Details
Published in	Nature biotechnology Vol. 21; no. 4; pp. 435 - 439
Main Authors	Liu, Jun S, Qin, Zhaohui S, McCue, Lee Ann, Thompson, William, Mayerhofer, Linda, Lawrence, Charles E
Format	Journal Article
Language	English
Published	New York, NY Nature 01.04.2003 Nature Publishing Group
Subjects	Algorithms Bacterial Proteins - genetics Bayes Theorem Biological and medical sciences Cluster Analysis Conserved Sequence - genetics Escherichia coli - genetics Fundamental and applied biological sciences. Psychology Gene Expression Regulation, Bacterial - genetics Models, Genetic Molecular and cellular biology Molecular genetics Promoter Regions, Genetic - genetics Regulon - genetics Reproducibility of Results Sequence Alignment - methods Sequence Analysis, DNA - methods Sequence Analysis, Protein - methods Transcription. Transcription factor. Splicing. Rna processing Regulation(control) Binding site Gene Transcription Transcription factor
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are key steps toward understanding transcription regulation. In addition to effective laboratory assays, various computational approaches for the detection of TFBS in promoter regions of coexpressed genes have been developed. The availability of complete genome sequences combined with the likelihood that transcription factors and their cognate sites are often conserved during evolution has led to the development of phylogenetic footprinting. The modus operandi of this technique is to search for conserved motifs upstream of orthologous genes from closely related species. The method can identify hundreds of TFBS without prior knowledge of co-regulation or coexpression. Because many of these predicted sites are likely to be bound by the same transcription factor, motifs with similar patterns can be put into clusters so as to infer the sets of co-regulated genes, that is, the regulons. This strategy utilizes only genome sequence information and is complementary to and confirmative of gene expression data generated by microarray experiments. However, the limited data available to characterize individual binding patterns, the variation in motif alignment, motif width, and base conservation, and the lack of knowledge of the number and sizes of regulons make this inference problem difficult. We have developed a Gibbs sampling-based Bayesian motif clustering (BMC) algorithm to address these challenges. Tests on simulated data sets show that BMC produces many fewer errors than hierarchical and K-means clustering methods. The application of BMC to hundreds of predicted gamma-proteobacterial motifs correctly identified many experimentally reported regulons, inferred the existence of previously unreported members of these regulons, and suggested novel regulons.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	1087-0156 1546-1696
DOI:	10.1038/nbt802