연관성 기반 비유사성을 활용한 범주형 자료 군집분석
Purpose: The purpose of this study is to suggest a more efficient distance measure taking into account the relationship between categorical variables for categorical data cluster analysis. Methods: In this study, the association-based dissimilarity was employed to calculate the distance between two...
Saved in:
Published in | 品質經營學會誌 Vol. 47; no. 2; pp. 271 - 281 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | Korean |
Published |
한국품질경영학회
30.06.2019
|
Subjects | |
Online Access | Get full text |
ISSN | 1229-1889 2287-9005 |
DOI | 10.7469/JKSQM.2019.47.2.271 |
Cover
Loading…
Summary: | Purpose: The purpose of this study is to suggest a more efficient distance measure taking into account the relationship between categorical variables for categorical data cluster analysis.
Methods: In this study, the association-based dissimilarity was employed to calculate the distance between two categorical data observations and the distance obtained from the association-based dissimilarity was applied to the PAM cluster algorithms to verify its effectiveness. The strength of association between two different categorical variables can be calculated using a mixture of dissimilarities between the conditional probability distributions of other categorical variables, given these two categorical values. In particular, this method is suitable for datasets whose categorical variables are highly correlated.
Results: The simulation results using several real life data showed that the proposed distance which considered relationships among the categorical variables generally yielded better clustering performance than the Hamming distance. In addition, as the number of correlated variables was increasing, the difference in the performance of the two clustering methods based on different distance measures became statistically more significant.
Conclusion: This study revealed that the adoption of the relationship between categorical variables using our proposed method positively affected the results of cluster analysis. |
---|---|
Bibliography: | The Korean Society for Quality Management KISTI1.1003/JNL.JAKO201919866913389 |
ISSN: | 1229-1889 2287-9005 |
DOI: | 10.7469/JKSQM.2019.47.2.271 |