The exact joint distribution of the sum of heads and apparent size statistics of a "tandem repeats finder" algorithm

Tandem repeats play many important roles in biological research. However, accurate characterization of their properties is limited by the inability to easily detect them. For this reason, much work has been devoted to developing detection algorithms. A widely used algorithm for detecting tandem repe...

Full description

Saved in:
Bibliographic Details
Published inBulletin of mathematical biology Vol. 68; no. 8; pp. 2353 - 2364
Main Author Martin, Donald E K
Format Journal Article
LanguageEnglish
Published United States Springer Nature B.V 01.11.2006
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Tandem repeats play many important roles in biological research. However, accurate characterization of their properties is limited by the inability to easily detect them. For this reason, much work has been devoted to developing detection algorithms. A widely used algorithm for detecting tandem repeats is the "tandem repeats finder'' (Benson, G., Nucleic Acids Res. 27, 573-580, 1999). In that algorithm, tandem repeats are modeled by percent matches and frequency of indels between adjacent pattern copies, and statistical criteria are used to recognize them. We give a method for computing the exact joint distribution of a pair of statistics that are used in the testing procedures of the "tandem repeats finder'': the total number of matches in matching tuples of length k or longer, and the total number of observations from the beginning of the first such matching tuple to the end of the last one. This allows the computation of the conditional distribution of the latter statistic given the former, a conditional distribution that is used to test for tandem repeats as opposed to non-tandem direct repeats. The setting is a Markovian sequence of a general order. Current approaches to this distributional problem deal only with independent trials and are based on approximations via simulation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0092-8240
1522-9602
DOI:10.1007/s11538-006-9146-0