Summarization and visualization of multi-level and multi-dimensional itemsets
•Multi-level and multi-dimensional itemsets describe events at different levels of abstraction.•We propose an approach to summarize and visualize frequent itemsets.•We define a similarity function coupling feature- and support-based principles.•We show that hierarchical summaries can be efficiently...
Saved in:
Published in | Information sciences Vol. 520; pp. 63 - 85 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.05.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Multi-level and multi-dimensional itemsets describe events at different levels of abstraction.•We propose an approach to summarize and visualize frequent itemsets.•We define a similarity function coupling feature- and support-based principles.•We show that hierarchical summaries can be efficiently computed.•We show that the visualization we propose is effective.
Frequent itemset (FI) mining aims at discovering relevant patterns from sets of transactions. In this work we focus on multi-level and multi-dimensional data, which provide a rich description of subjects through multiple features each at different levels of detail. Summarization of FIs has been only marginally addressed so far with specific reference to multi-level and multi-dimensional FIs. In this paper we fill this gap by proposing SUSHI, a framework for summarizing and visually exploring multi-level and multi-dimensional FIs. Specifically, SUSHI is based on (i) a similarity function for FIs which takes into account both their extensional (support-based) and intensional (feature-based) natures; (ii) theoretical results concerning antimonotonicity of support and similarity in multi-level settings, which allow us to propose an efficient clustering algorithm to generate hierarchical summaries; and (iii) two integrated approaches to summary visualization and exploration: a graph-based one, which highlights inter-cluster relationships, and a tree-based one, which emphasizes the relationships between the representative of each cluster and the other FIs in that cluster. SUSHI is evaluated using both a real and a synthetic dataset in terms of effectiveness, efficiency, and understandability of the summary, with reference to three different strategies for choosing cluster representatives. Overall, SUSHI shows to outperform previous approaches and to be a valuable tool to expedite the analysis of FIs. Besides, one of the three strategies for choosing cluster representatives shows to be the most effective one. |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2020.02.006 |