Soft Set Based Clustering and Its Comparison on Categorical Data

Categorical data clustering is problematic since it is difficult or complex to determine how comparable the data is. Several methods, most recently centroid-based strategies, have been developed to reduce the complexity of the similarity of categorical data. These methods nevertheless result in leng...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 9th Information Technology International Seminar (ITIS) pp. 1 - 5
Main Authors Riyadi Yanto, Iwan Tri, WaiShiang, Cheah, Hidayat, Rahmat, Wahyudi, Rofiul, Suprihatin, Apriani, Ani
Format Conference Proceeding
LanguageEnglish
Published IEEE 18.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Categorical data clustering is problematic since it is difficult or complex to determine how comparable the data is. Several methods, most recently centroid-based strategies, have been developed to reduce the complexity of the similarity of categorical data. These methods nevertheless result in lengthy processing durations. Another method, soft set-based clustering (SSC), based on the probability function of multivariate multinomial distributions, is suggested in this article. Soft sets are used to represent the data, and each soft set has a probability for each object. The joint cluster distribution function determines the probability for each object after the multivariate multinomial distribution function. The connected cluster would receive the highest likelihood. Benchmark data sets from UCI machine learning are used to compare the performance of the approach to the baseline techniques. The outcomes demonstrate that the suggested strategy performed better in purity, rank index, and calculation time.
DOI:10.1109/ITIS59651.2023.10419962