Soft Set Based Clustering and Its Comparison on Categorical Data

Categorical data clustering is problematic since it is difficult or complex to determine how comparable the data is. Several methods, most recently centroid-based strategies, have been developed to reduce the complexity of the similarity of categorical data. These methods nevertheless result in leng...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE 9th Information Technology International Seminar (ITIS) pp. 1 - 5
Main Authors	Riyadi Yanto, Iwan Tri, WaiShiang, Cheah, Hidayat, Rahmat, Wahyudi, Rofiul, Suprihatin, Apriani, Ani
Format	Conference Proceeding
Language	English
Published	IEEE 18.10.2023
Subjects	categorical data Distribution functions Indexes Information technology Machine learning multinomial distribution Seminars Soft set Stability criteria Time factors
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Categorical data clustering is problematic since it is difficult or complex to determine how comparable the data is. Several methods, most recently centroid-based strategies, have been developed to reduce the complexity of the similarity of categorical data. These methods nevertheless result in lengthy processing durations. Another method, soft set-based clustering (SSC), based on the probability function of multivariate multinomial distributions, is suggested in this article. Soft sets are used to represent the data, and each soft set has a probability for each object. The joint cluster distribution function determines the probability for each object after the multivariate multinomial distribution function. The connected cluster would receive the highest likelihood. Benchmark data sets from UCI machine learning are used to compare the performance of the approach to the baseline techniques. The outcomes demonstrate that the suggested strategy performed better in purity, rank index, and calculation time.
DOI:	10.1109/ITIS59651.2023.10419962