Summarization and visualization of multi-level and multi-dimensional itemsets

•Multi-level and multi-dimensional itemsets describe events at different levels of abstraction.•We propose an approach to summarize and visualize frequent itemsets.•We define a similarity function coupling feature- and support-based principles.•We show that hierarchical summaries can be efficiently...

Full description

Saved in:

Bibliographic Details
Published in	Information sciences Vol. 520; pp. 63 - 85
Main Authors	Francia, Matteo, Golfarelli, Matteo, Rizzi, Stefano
Format	Journal Article
Language	English
Published	Elsevier Inc 01.05.2020
Subjects	Frequent itemset mining Hierarchical clustering Itemset summarization Itemset visualization Frequent itemset mining Itemset summarization Itemset visualization Hierarchical clustering
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Multi-level and multi-dimensional itemsets describe events at different levels of abstraction.•We propose an approach to summarize and visualize frequent itemsets.•We define a similarity function coupling feature- and support-based principles.•We show that hierarchical summaries can be efficiently computed.•We show that the visualization we propose is effective. Frequent itemset (FI) mining aims at discovering relevant patterns from sets of transactions. In this work we focus on multi-level and multi-dimensional data, which provide a rich description of subjects through multiple features each at different levels of detail. Summarization of FIs has been only marginally addressed so far with specific reference to multi-level and multi-dimensional FIs. In this paper we fill this gap by proposing SUSHI, a framework for summarizing and visually exploring multi-level and multi-dimensional FIs. Specifically, SUSHI is based on (i) a similarity function for FIs which takes into account both their extensional (support-based) and intensional (feature-based) natures; (ii) theoretical results concerning antimonotonicity of support and similarity in multi-level settings, which allow us to propose an efficient clustering algorithm to generate hierarchical summaries; and (iii) two integrated approaches to summary visualization and exploration: a graph-based one, which highlights inter-cluster relationships, and a tree-based one, which emphasizes the relationships between the representative of each cluster and the other FIs in that cluster. SUSHI is evaluated using both a real and a synthetic dataset in terms of effectiveness, efficiency, and understandability of the summary, with reference to three different strategies for choosing cluster representatives. Overall, SUSHI shows to outperform previous approaches and to be a valuable tool to expedite the analysis of FIs. Besides, one of the three strategies for choosing cluster representatives shows to be the most effective one.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2020.02.006