Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework
•An efficient EFUPM to discover the fuzzy high-utility patterns is proposed.•A Hadoop-based HFUPM is proposed to handle large-scale databases.•Two upper-bounds are then designed to early remove the unpromising candidates.•Experiments showed that the better performance can be obtained. Over the past...
Saved in:
Published in | Information sciences Vol. 553; pp. 31 - 48 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.04.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •An efficient EFUPM to discover the fuzzy high-utility patterns is proposed.•A Hadoop-based HFUPM is proposed to handle large-scale databases.•Two upper-bounds are then designed to early remove the unpromising candidates.•Experiments showed that the better performance can be obtained.
Over the past decade, high-utility itemset mining (HUIM) has received widespread attention that can emphasize more critical information than was previously possible using frequent itemset mining (FIM). Unfortunately, HUIM is very similar to FIM since the methodology determines itemsets using a binary model based on a pre-defined minimum utility threshold. Additionally, most previous works only focused on single, small datasets in HUIM, which is not realistic to any real-world scenarios today containing big data environments. In this work, the fuzzy-set theory and a MapReduce framework are both utilized to design a novel high fuzzy utility pattern mining algorithm to resolve the above issues. Fuzzy-set theory is first involved and a new algorithm called efficient high fuzzy utility itemset mining (EFUPM) is designed to discover high fuzzy utility patterns from a single machine. Two upper-bounds are then estimated to allow early pruning of unpromising candidates in the search space. To handle the large-scale of big datasets, a Hadoop-based high fuzzy utility pattern mining (HFUPM) algorithm is then developed to discover high fuzzy utility patterns based on the Hadoop framework. Experimental results clearly show that the proposed algorithms perform strongly to mine the required high fuzzy utility patterns whether in a single machine or a large-scale environment compared to the current state-of-the-art approaches. |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2020.12.004 |