PASCAL: An EDA for parameterless shape-independent clustering

Data clustering is an unsupervised learning task that can be regarded as the combinatorial optimisation NP-hard problem of assigning N objects to one (or more) among k clusters. Most data clustering algorithms require the user to set a number of pre-defined parameters that have decisive impact in th...

Full description

Saved in:

Bibliographic Details
Published in	2016 IEEE Congress on Evolutionary Computation (CEC) pp. 3433 - 3440
Main Authors	Cagnini, Henry E. L., Barros, Rodrigo C.
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2016
Subjects	clustering Clustering algorithms Encoding estimation of distribution algorithms machine learning Optimization Partitioning algorithms Probabilistic logic Sociology Statistics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Data clustering is an unsupervised learning task that can be regarded as the combinatorial optimisation NP-hard problem of assigning N objects to one (or more) among k clusters. Most data clustering algorithms require the user to set a number of pre-defined parameters that have decisive impact in the formation of clusters, such as the number of clusters (initial or final), cluster radius, minimum number of objects, and similar parameters. In addition, several clustering algorithms are limited with regard to the shape of clusters that can be found, a limitation usually resulting from the optimisation process performed over a given distance metric. In this work, we propose a novel clustering algorithm that addresses the two aforementioned problems regarding the amount of parameters and cluster shape. Our approach makes use of the theory of Estimation of Distribution Algorithms in order to probabilistic sample a set of must-link/cannot link constraints in order to generate a data partition with the proper number of clusters. We name our method PASCAL, and we empirically show that it is capable of not only detecting the right number of clusters but also of properly assigning objects to the correct cluster in a variety of artificial and real problems whose solutions are known in advance.
DOI:	10.1109/CEC.2016.7744224