PASCAL: An EDA for parameterless shape-independent clustering
Data clustering is an unsupervised learning task that can be regarded as the combinatorial optimisation NP-hard problem of assigning N objects to one (or more) among k clusters. Most data clustering algorithms require the user to set a number of pre-defined parameters that have decisive impact in th...
Saved in:
Published in | 2016 IEEE Congress on Evolutionary Computation (CEC) pp. 3433 - 3440 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.07.2016
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Data clustering is an unsupervised learning task that can be regarded as the combinatorial optimisation NP-hard problem of assigning N objects to one (or more) among k clusters. Most data clustering algorithms require the user to set a number of pre-defined parameters that have decisive impact in the formation of clusters, such as the number of clusters (initial or final), cluster radius, minimum number of objects, and similar parameters. In addition, several clustering algorithms are limited with regard to the shape of clusters that can be found, a limitation usually resulting from the optimisation process performed over a given distance metric. In this work, we propose a novel clustering algorithm that addresses the two aforementioned problems regarding the amount of parameters and cluster shape. Our approach makes use of the theory of Estimation of Distribution Algorithms in order to probabilistic sample a set of must-link/cannot link constraints in order to generate a data partition with the proper number of clusters. We name our method PASCAL, and we empirically show that it is capable of not only detecting the right number of clusters but also of properly assigning objects to the correct cluster in a variety of artificial and real problems whose solutions are known in advance. |
---|---|
DOI: | 10.1109/CEC.2016.7744224 |