Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework

•An efficient EFUPM to discover the fuzzy high-utility patterns is proposed.•A Hadoop-based HFUPM is proposed to handle large-scale databases.•Two upper-bounds are then designed to early remove the unpromising candidates.•Experiments showed that the better performance can be obtained. Over the past...

Full description

Saved in:

Bibliographic Details
Published in	Information sciences Vol. 553; pp. 31 - 48
Main Authors	Wu, Jimmy Ming-Tai, Srivastava, Gautam, Wei, Min, Yun, Unil, Lin, Jerry Chun-Wei
Format	Journal Article
Language	English
Published	Elsevier Inc 01.04.2021
Subjects	Big-data Fuzzy-set theory Hadoop High fuzzy utility pattern High utility itemset mining MapReduce High utility itemset mining High fuzzy utility pattern Fuzzy-set theory Hadoop Big-data MapReduce
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•An efficient EFUPM to discover the fuzzy high-utility patterns is proposed.•A Hadoop-based HFUPM is proposed to handle large-scale databases.•Two upper-bounds are then designed to early remove the unpromising candidates.•Experiments showed that the better performance can be obtained. Over the past decade, high-utility itemset mining (HUIM) has received widespread attention that can emphasize more critical information than was previously possible using frequent itemset mining (FIM). Unfortunately, HUIM is very similar to FIM since the methodology determines itemsets using a binary model based on a pre-defined minimum utility threshold. Additionally, most previous works only focused on single, small datasets in HUIM, which is not realistic to any real-world scenarios today containing big data environments. In this work, the fuzzy-set theory and a MapReduce framework are both utilized to design a novel high fuzzy utility pattern mining algorithm to resolve the above issues. Fuzzy-set theory is first involved and a new algorithm called efficient high fuzzy utility itemset mining (EFUPM) is designed to discover high fuzzy utility patterns from a single machine. Two upper-bounds are then estimated to allow early pruning of unpromising candidates in the search space. To handle the large-scale of big datasets, a Hadoop-based high fuzzy utility pattern mining (HFUPM) algorithm is then developed to discover high fuzzy utility patterns based on the Hadoop framework. Experimental results clearly show that the proposed algorithms perform strongly to mine the required high fuzzy utility patterns whether in a single machine or a large-scale environment compared to the current state-of-the-art approaches.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2020.12.004