TrieAMD: a scalable and efficient apriori motif discovery approach

Motif discovery is the problem of finding recurring patterns in biological sequences. It is one of the hardest and long-standing problems in bioinformatics. Apriori is a well-known data-mining algorithm for the discovery of frequent patterns in large datasets. In this paper, we apply the Apriori alg...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of data mining and bioinformatics Vol. 13; no. 1; p. 13
Main Authors Al-Turaiki, Isra, Badr, Ghada, Mathkour, Hassan
Format Journal Article
LanguageEnglish
Published Switzerland 2015
Subjects
Online AccessGet more information

Cover

Loading…
More Information
Summary:Motif discovery is the problem of finding recurring patterns in biological sequences. It is one of the hardest and long-standing problems in bioinformatics. Apriori is a well-known data-mining algorithm for the discovery of frequent patterns in large datasets. In this paper, we apply the Apriori algorithm and use the Trie data structure to discover motifs. We propose several modifications so that we can adapt the classic Apriori to our problem. Experiments are conducted on Tompa's benchmark to investigate the performance of our proposed algorithm, the Trie-based Apriori Motif Discovery (TrieAMD). Results show that our algorithm outperforms all of the tested tools on real datasets for the average sensitivity measure, which means that our approach is able to discover more motifs. In terms of specificity, the performance of our algorithm is comparable to the other tools. The results also confirm both linear time and linear space scalability of the algorithm.
ISSN:1748-5673
DOI:10.1504/IJDMB.2015.070833