Supervised Detection of Regulatory Motifs in DNA Sequences
Abstract Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs motif sampler) model binding si...
Saved in:
Published in | Statistical Applications in Genetics and Molecular Biology Vol. 2; no. 1; pp. 5 - Article5 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Germany
bepress
25.08.2003
De Gruyter |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Abstract
Identification of transcription factor binding
sites (regulatory motifs) is a major interest in contemporary
biology. We propose a new likelihood based method, COMODE, for
identifying structural motifs in DNA sequences.
Commonly used methods (e.g. MEME, Gibbs motif sampler) model binding
sites as families of sequences described by a position weight
matrix (PWM) and identify PWMs that maximize the likelihood of
observed sequence data under a simple multinomial mixture model. This model
assumes that the positions of the PWM correspond to independent
multinomial distributions with four cell probabilities.
We address supervising the search for
DNA binding sites using the information
derived from structural characteristics of protein-DNA
interactions. We extend the simple multinomial mixture model to a
constrained multinomial mixture model
by incorporating constraints on the information content profiles or
on specific parameters of the motif PWMs. The parameters of this
extended model are estimated by maximum likelihood using a
nonlinear constraint optimization method. Likelihood-based
cross-validation is used to select model parameters such as motif
width and constraint type.
The performance of COMODE is compared with
existing motif detection methods on simulated data that
incorporate real motif examples from
Saccharomyces cerevisiae. The proposed method is
especially effective when the motif of interest appears as a
weak signal in the data. Some of the transcription
factor binding data of Lee et al. (2002) were also analyzed using COMODE
and biologically verified sites were identified.
Submitted: May 15, 2003 · Accepted: July 28, 2003 · Published: August 25, 2003
Recommended Citation
Keles, Sunduz; van der Laan, Mark J.; Dudoit, Sandrine ; Xing, Biao; and Eisen , Michael B.
(2003)
"Supervised Detection of Regulatory Motifs in DNA Sequences,"
Statistical Applications in Genetics and Molecular Biology:
Vol. 2
:
Iss.
1, Article 5.
DOI: 10.2202/1544-6115.1015
Available at: http://www.bepress.com/sagmb/vol2/iss1/art5 |
---|---|
Bibliography: | ark:/67375/QT4-B8802ZF2-L sagmb.2003.2.1.1015.pdf istex:572622BA9B65AD2390FDB27FB25139D56325359E ArticleID:1544-6115.1015 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1544-6115 1544-6115 |
DOI: | 10.2202/1544-6115.1015 |