A probabilistic framework for optimizing projected clusters with categorical attributes

The ability to discover projected clusters in high-dimensional data is essential for many machine learning applications. Projective clustering of categorical data is currently a challenge due to the difficulties in learning adaptive weights for categorical attributes coordinating with clusters optim...

Full description

Saved in:
Bibliographic Details
Published inScience China. Information sciences Vol. 58; no. 7; pp. 138 - 152
Main Author Chen, LiFei
Format Journal Article
LanguageEnglish
Published Beijing Science China Press 01.07.2015
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The ability to discover projected clusters in high-dimensional data is essential for many machine learning applications. Projective clustering of categorical data is currently a challenge due to the difficulties in learning adaptive weights for categorical attributes coordinating with clusters optimization. In this paper, a probability-based learning framework is proposed, which allows both the attribute weights and the center- based clusters to be optimized by kernel density estimation on categorical attributes. A novel algorithm is then derived for projective clustering on categorical data, based on the new learning approach for the kernel bandwidth selection problem. We show that the attribute weight substantially connects to the kernel bandwidth, while the optimized cluster center corresponds to the normalized frequency estimator of the categorical attributes. Experimental results on synthesis and real-world data show outstanding performance of the proposed method, which significantly outperforms state-of-the-art algorithms.
Bibliography:11-5847/TP
The ability to discover projected clusters in high-dimensional data is essential for many machine learning applications. Projective clustering of categorical data is currently a challenge due to the difficulties in learning adaptive weights for categorical attributes coordinating with clusters optimization. In this paper, a probability-based learning framework is proposed, which allows both the attribute weights and the center- based clusters to be optimized by kernel density estimation on categorical attributes. A novel algorithm is then derived for projective clustering on categorical data, based on the new learning approach for the kernel bandwidth selection problem. We show that the attribute weight substantially connects to the kernel bandwidth, while the optimized cluster center corresponds to the normalized frequency estimator of the categorical attributes. Experimental results on synthesis and real-world data show outstanding performance of the proposed method, which significantly outperforms state-of-the-art algorithms.
projective clustering, projected cluster, categorical data, probabilistic framework, kernel densityestimation, attribute weighting
ISSN:1674-733X
1869-1919
DOI:10.1007/s11432-014-5267-5