Supervised Detection of Regulatory Motifs in DNA Sequences

Abstract Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs motif sampler) model binding si...

Full description

Saved in:
Bibliographic Details
Published inStatistical Applications in Genetics and Molecular Biology Vol. 2; no. 1; pp. 5 - Article5
Main Authors Keles, Sunduz, van der Laan, Mark J, Dudoit, Sandrine, Xing, Biao, Eisen , Michael B
Format Journal Article
LanguageEnglish
Published Germany bepress 25.08.2003
De Gruyter
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs motif sampler) model binding sites as families of sequences described by a position weight matrix (PWM) and identify PWMs that maximize the likelihood of observed sequence data under a simple multinomial mixture model. This model assumes that the positions of the PWM correspond to independent multinomial distributions with four cell probabilities. We address supervising the search for DNA binding sites using the information derived from structural characteristics of protein-DNA interactions. We extend the simple multinomial mixture model to a constrained multinomial mixture model by incorporating constraints on the information content profiles or on specific parameters of the motif PWMs. The parameters of this extended model are estimated by maximum likelihood using a nonlinear constraint optimization method. Likelihood-based cross-validation is used to select model parameters such as motif width and constraint type. The performance of COMODE is compared with existing motif detection methods on simulated data that incorporate real motif examples from Saccharomyces cerevisiae. The proposed method is especially effective when the motif of interest appears as a weak signal in the data. Some of the transcription factor binding data of Lee et al. (2002) were also analyzed using COMODE and biologically verified sites were identified. Submitted: May 15, 2003 · Accepted: July 28, 2003 · Published: August 25, 2003 Recommended Citation Keles, Sunduz; van der Laan, Mark J.; Dudoit, Sandrine ; Xing, Biao; and Eisen , Michael B. (2003) "Supervised Detection of Regulatory Motifs in DNA Sequences," Statistical Applications in Genetics and Molecular Biology: Vol. 2 : Iss. 1, Article 5. DOI: 10.2202/1544-6115.1015 Available at: http://www.bepress.com/sagmb/vol2/iss1/art5
Bibliography:ark:/67375/QT4-B8802ZF2-L
sagmb.2003.2.1.1015.pdf
istex:572622BA9B65AD2390FDB27FB25139D56325359E
ArticleID:1544-6115.1015
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1544-6115
1544-6115
DOI:10.2202/1544-6115.1015