TrieAMD: a scalable and efficient apriori motif discovery approach
Motif discovery is the problem of finding recurring patterns in biological sequences. It is one of the hardest and long-standing problems in bioinformatics. Apriori is a well-known data-mining algorithm for the discovery of frequent patterns in large datasets. In this paper, we apply the Apriori alg...
Saved in:
Published in | International journal of data mining and bioinformatics Vol. 13; no. 1; p. 13 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Switzerland
2015
|
Subjects | |
Online Access | Get more information |
Cover
Loading…
Summary: | Motif discovery is the problem of finding recurring patterns in biological sequences. It is one of the hardest and long-standing problems in bioinformatics. Apriori is a well-known data-mining algorithm for the discovery of frequent patterns in large datasets. In this paper, we apply the Apriori algorithm and use the Trie data structure to discover motifs. We propose several modifications so that we can adapt the classic Apriori to our problem. Experiments are conducted on Tompa's benchmark to investigate the performance of our proposed algorithm, the Trie-based Apriori Motif Discovery (TrieAMD). Results show that our algorithm outperforms all of the tested tools on real datasets for the average sensitivity measure, which means that our approach is able to discover more motifs. In terms of specificity, the performance of our algorithm is comparable to the other tools. The results also confirm both linear time and linear space scalability of the algorithm. |
---|---|
ISSN: | 1748-5673 |
DOI: | 10.1504/IJDMB.2015.070833 |