Semi-supervised data clustering using particle swarm optimisation

In this study, we propose the semi-supervised particle swarm optimisation (ssPSO) algorithm for data clustering. The algorithm takes advantage of the strengths of semi-supervised fuzzy c-means (ssFCM) and particle swarm optimisation (PSO) to allow for a more informed search using labelled data acros...

Full description

Saved in:
Bibliographic Details
Published inSoft computing (Berlin, Germany) Vol. 24; no. 5; pp. 3499 - 3510
Main Authors Lai, Daphne T. C., Miyakawa, Minami, Sato, Yuji
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.03.2020
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this study, we propose the semi-supervised particle swarm optimisation (ssPSO) algorithm for data clustering. The algorithm takes advantage of the strengths of semi-supervised fuzzy c-means (ssFCM) and particle swarm optimisation (PSO) to allow for a more informed search using labelled data across small number of iterations while maintaining diversity in the search process. ssFCM algorithms can find meaningful clusters using available labelled data to guide the learning process. PSOs are often chosen to solve clustering problems due to their versatility in problem representation and exploration capabilities. To verify the goodness of ssPSOs and provide practical insights to researchers, the clustering performances and clustering behaviours of ssPSOs are investigated and compared with PSO variants and ssFCMs. Two approaches of ssPSO were studied, one applied at initialisation only and the other throughout the learning process. Evaluated based on accuracy and quantisation error (QE), the ssPSO, PSOs and ssFCM algorithms were tested on 13 UCI datasets with different sizes, dimensions, number of classes and distribution, exploring several swarm size and maximum iteration settings over 100 runs. Visual examination of biplots and convergence graphs was conducted. ssPSOs were found to perform competitively well with ssFCM in most datasets in terms of accuracy and outperform ssFCM in terms of QE using swarm size 20 and maximum iteration 20. The results demonstrate that ssPSOs perform particularly well in sparsely distributed datasets with overlapping clusters and produce clusters with better structures in terms of QE. Furthermore, ssPSOs were demonstrated to perform competitively well as ssFCM in datasets with more than three clusters, while QPSO performed poorly in such datasets.
ISSN:1432-7643
1433-7479
DOI:10.1007/s00500-019-04114-z