Extending common intervals searching from permutations to sequences

Common intervals have been defined as a modelization of gene clusters in genomes represented either as permutations or as sequences. Whereas optimal algorithms for finding common intervals in permutations exist even for an arbitrary number of permutations, in sequences no optimal algorithm has been...

Full description

Saved in:

Bibliographic Details
Published in	Journal of discrete algorithms (Amsterdam, Netherlands) Vol. 29; pp. 27 - 46
Main Author	Rusu, Irena
Format	Journal Article
Language	English
Published	Elsevier B.V 01.11.2014
Subjects	Algorithm Common intervals Genome Permutation Sequence Sequence Genome Permutation Algorithm Common intervals
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Common intervals have been defined as a modelization of gene clusters in genomes represented either as permutations or as sequences. Whereas optimal algorithms for finding common intervals in permutations exist even for an arbitrary number of permutations, in sequences no optimal algorithm has been proposed yet even for only two sequences. Surprisingly enough, when sequences are reduced to permutations, the existing algorithms perform far from the optimum, showing that their performances are not dependent, as they should be, on the structural complexity of the input sequences. In this paper, we propose to characterize the structure of a sequence by the number q of different dominating orders composing it (called the domination number), and to use a recent algorithm for permutations in order to devise a new algorithm for two sequences. Its running time is in O(q1q2p+q1n1+q2n2+N), where n1, n2 are the sizes of the two sequences, q1, q2 are their respective domination numbers, p is the alphabet size and N is the number of solutions to output. This algorithm performs better as q1 and/or q2 reduce, and when the two sequences are reduced to permutations (i.e. when q1=q2=1) it has the same running time as the best algorithms for permutations. It is also the first algorithm for sequences whose running time involves the parameter size of the solution. As a counterpart, when q1 and q2 are of O(n1) and O(n2) respectively, the algorithm is less efficient than other approaches.
ISSN:	1570-8667 1570-8675
DOI:	10.1016/j.jda.2014.10.004