연관성 기반 비유사성을 활용한 범주형 자료 군집분석

Purpose: The purpose of this study is to suggest a more efficient distance measure taking into account the relationship between categorical variables for categorical data cluster analysis. Methods: In this study, the association-based dissimilarity was employed to calculate the distance between two...

Full description

Saved in:
Bibliographic Details
Published in品質經營學會誌 Vol. 47; no. 2; pp. 271 - 281
Main Authors 이창기, Changki Lee, 정욱, Uk Jung
Format Journal Article
LanguageKorean
Published 한국품질경영학회 30.06.2019
Subjects
Online AccessGet full text
ISSN1229-1889
2287-9005
DOI10.7469/JKSQM.2019.47.2.271

Cover

Loading…
More Information
Summary:Purpose: The purpose of this study is to suggest a more efficient distance measure taking into account the relationship between categorical variables for categorical data cluster analysis. Methods: In this study, the association-based dissimilarity was employed to calculate the distance between two categorical data observations and the distance obtained from the association-based dissimilarity was applied to the PAM cluster algorithms to verify its effectiveness. The strength of association between two different categorical variables can be calculated using a mixture of dissimilarities between the conditional probability distributions of other categorical variables, given these two categorical values. In particular, this method is suitable for datasets whose categorical variables are highly correlated. Results: The simulation results using several real life data showed that the proposed distance which considered relationships among the categorical variables generally yielded better clustering performance than the Hamming distance. In addition, as the number of correlated variables was increasing, the difference in the performance of the two clustering methods based on different distance measures became statistically more significant. Conclusion: This study revealed that the adoption of the relationship between categorical variables using our proposed method positively affected the results of cluster analysis.
Bibliography:The Korean Society for Quality Management
KISTI1.1003/JNL.JAKO201919866913389
ISSN:1229-1889
2287-9005
DOI:10.7469/JKSQM.2019.47.2.271