PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data

Here, we present a novel algorithm for frequent itemset mining in streaming data (FIM-SD). For the past decade, various FIM-SD methods in one-pass approximation settings that allow to approximate the support of each itemset have been proposed. They can be categorized into two approximation types: pa...

Full description

Saved in:
Bibliographic Details
Published inJournal of intelligent information systems Vol. 55; no. 1; pp. 119 - 147
Main Authors Yamamoto, Yoshitaka, Tabei, Yasuo, Iwanuma, Koji
Format Journal Article
LanguageEnglish
Published New York Springer US 01.08.2020
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Here, we present a novel algorithm for frequent itemset mining in streaming data (FIM-SD). For the past decade, various FIM-SD methods in one-pass approximation settings that allow to approximate the support of each itemset have been proposed. They can be categorized into two approximation types: parameter-constrained (PC) mining and resource-constrained (RC) mining. PC methods control the maximum error that can be included in the approximate support based on a pre-defined parameter. In contrast, RC methods limit the maximum memory consumption based on resource constraints. However, the existing PC methods can exponentially increase the memory consumption, while the existing RC methods can rapidly increase the maximum error. In this study, we address this problem by introducing a hybrid approach of PC-RC approximations, called PARASOL . For any streaming data, PARASOL ensures to provide a condensed representation, called a Δ-covered set , which is regarded as an extension of the closedness compression; when Δ = 0, the solution corresponds to the ordinary closed itemsets. PARASOL searches for such approximate closed itemsets that can restore the frequent itemsets and their supports while the maximum error is bounded by an integer, Δ. Then, we empirically demonstrate that the proposed algorithm significantly outperforms the state-of-the-art PC and RC methods for FIM-SD.
ISSN:0925-9902
1573-7675
DOI:10.1007/s10844-019-00590-9