Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs
Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed i...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
30.07.2013
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed in this context. To assess the enrichment of a PWM motif in a ranked list we use a PWM induced second ranking on the same set of elements. Possible orders of one ranked list relative to the other are modeled by permutations. Due to sample space complexity, it is difficult to characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top of two uniformly and independently drawn permutations and demonstrate advantages of this approach using our software implementation, mmHG-Finder, to study PWMs in several datasets. |
---|---|
ISSN: | 2331-8422 |