Fuzzy clustering and fuzzy c-means partition cluster analysis and validation studies on a subset of citescore dataset

A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clus...

Full description

Saved in:

Bibliographic Details
Published in	International journal of electrical and computer engineering (Malacca, Malacca) Vol. 9; no. 4; p. 2760
Main Authors	Rajkumar, K. Varada, Yesubabu, Adimulam, Subrahmanyam, K.
Format	Journal Article
Language	English
Published	Yogyakarta IAES Institute of Advanced Engineering and Science 01.08.2019
Subjects	Algorithms Cluster analysis Clustering Coefficient of variation Data points Datasets Partitions Statistical analysis
Online Access	Get full text
ISSN	2088-8708 2088-8708
DOI	10.11591/ijece.v9i4.pp2760-2770

Cover

Loading…

More Information
Summary:	A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter m was evaluated to determine the distribution of membership values with variation in m from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2088-8708 2088-8708
DOI:	10.11591/ijece.v9i4.pp2760-2770