Agglomerative Clustering in Uniform and Proportional Feature Spaces

Pattern comparison represents a fundamental and crucial aspect of scientific modeling, artificial intelligence, and pattern recognition. Three main approaches have typically been applied for pattern comparison: (i) distances; (ii) statistical joint variation; (iii) projections; and (iv) similarity i...

Full description

Saved in:

Bibliographic Details
Main Authors	Benatti, Alexandre, Costa, Luciano da F
Format	Journal Article
Language	English
Published	11.07.2024
Subjects	Physics - Physics and Society
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Pattern comparison represents a fundamental and crucial aspect of scientific modeling, artificial intelligence, and pattern recognition. Three main approaches have typically been applied for pattern comparison: (i) distances; (ii) statistical joint variation; (iii) projections; and (iv) similarity indices, each with their specific characteristics. In addition to arguing for intrinsic interesting properties of multiset-based similarity approaches, the present work describes a respectively based hierarchical agglomerative clustering approach which inherits the several interesting characteristics of the coincidence similarity index -- including strict comparisons allowing distinguishing between closely similar patterns, inherent normalization, as well as substantial robustness to the presence of noise and outliers in datasets. Two other hierarchical clustering approaches are considered, namely a multiset-based method as well as the traditional Ward's approach. After characterizing uniform and proportional features spaces and presenting the main basic concepts and methods, a comparison of relative performance between the three considered hierarchical methods is reported and discussed, with several interesting and important results. In particular, though intrinsically suitable for implementing proportional comparisons, the coincidence similarity methodology also works effectively in several types of data in uniform feature spaces
DOI:	10.48550/arxiv.2407.08604