A probabilistic framework for optimizing projected clusters with categorical attributes
The ability to discover projected clusters in high-dimensional data is essential for many machine learning applications. Projective clustering of categorical data is currently a challenge due to the difficulties in learning adaptive weights for categorical attributes coordinating with clusters optim...
Saved in:
Published in | Science China. Information sciences Vol. 58; no. 7; pp. 138 - 152 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Beijing
Science China Press
01.07.2015
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The ability to discover projected clusters in high-dimensional data is essential for many machine learning applications. Projective clustering of categorical data is currently a challenge due to the difficulties in learning adaptive weights for categorical attributes coordinating with clusters optimization. In this paper, a probability-based learning framework is proposed, which allows both the attribute weights and the center- based clusters to be optimized by kernel density estimation on categorical attributes. A novel algorithm is then derived for projective clustering on categorical data, based on the new learning approach for the kernel bandwidth selection problem. We show that the attribute weight substantially connects to the kernel bandwidth, while the optimized cluster center corresponds to the normalized frequency estimator of the categorical attributes. Experimental results on synthesis and real-world data show outstanding performance of the proposed method, which significantly outperforms state-of-the-art algorithms. |
---|---|
Bibliography: | 11-5847/TP The ability to discover projected clusters in high-dimensional data is essential for many machine learning applications. Projective clustering of categorical data is currently a challenge due to the difficulties in learning adaptive weights for categorical attributes coordinating with clusters optimization. In this paper, a probability-based learning framework is proposed, which allows both the attribute weights and the center- based clusters to be optimized by kernel density estimation on categorical attributes. A novel algorithm is then derived for projective clustering on categorical data, based on the new learning approach for the kernel bandwidth selection problem. We show that the attribute weight substantially connects to the kernel bandwidth, while the optimized cluster center corresponds to the normalized frequency estimator of the categorical attributes. Experimental results on synthesis and real-world data show outstanding performance of the proposed method, which significantly outperforms state-of-the-art algorithms. projective clustering, projected cluster, categorical data, probabilistic framework, kernel densityestimation, attribute weighting |
ISSN: | 1674-733X 1869-1919 |
DOI: | 10.1007/s11432-014-5267-5 |