Incremental Maintenance of Biological Databases Using Association Rule Mining

Biological research frequently requires specialist databases to support in-depth analysis about specific subjects. With the rapid growth of biological sequences in public domain data sources, it is difficult to keep these databases current with the sources. Simple queries formulated to retrieve rele...

Full description

Saved in:

Bibliographic Details
Published in	Pattern Recognition in Bioinformatics pp. 140 - 150
Main Authors	Lam, Kai-Tak, Koh, Judice L. Y., Veeravalli, Bharadwaj, Brusic, Vladimir
Format	Book Chapter
Language	English
Published	Berlin, Heidelberg Springer Berlin Heidelberg 2006
Series	Lecture Notes in Computer Science
Subjects	Association Rule Mining Complex Query Frequent Itemsets Original Query Specialist Database
Online Access	Get full text
ISBN	9783540374466 3540374469
ISSN	0302-9743 1611-3349
DOI	10.1007/11818564_16

Cover

More Information
Summary:	Biological research frequently requires specialist databases to support in-depth analysis about specific subjects. With the rapid growth of biological sequences in public domain data sources, it is difficult to keep these databases current with the sources. Simple queries formulated to retrieve relevant sequences typically return a large number of false matches and thus demanding manual filtration. In this paper, we propose a novel methodology that can support automatic incremental updating of specialist databases. Complex queries for incremental updating of relevant sequences are learned using Association Rule Mining (ARM), resulting in a significant reduction in false positive matches. This is the first time ARM is used in formulating descriptive queries for the purpose of incremental maintenance of specialised biological databases. We have implemented and tested our methodology on two real-world databases. Our experiments conclusively show that the methodology guarantees an F-score of up to 80% in detecting new sequences for these two databases.
ISBN:	9783540374466 3540374469
ISSN:	0302-9743 1611-3349
DOI:	10.1007/11818564_16