Deep Clustering of Tabular Data by Weighted Gaussian Distribution Learning
Deep learning methods are primarily proposed for supervised learning of images or text with limited applications to clustering problems. In contrast, tabular data with heterogeneous features pose unique challenges in representation learning, where deep learning has yet to replace traditional machine...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
02.01.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Deep learning methods are primarily proposed for supervised learning of
images or text with limited applications to clustering problems. In contrast,
tabular data with heterogeneous features pose unique challenges in
representation learning, where deep learning has yet to replace traditional
machine learning. This paper addresses these challenges in developing one of
the first deep clustering methods for tabular data: Gaussian Cluster Embedding
in Autoencoder Latent Space (G-CEALS). G-CEALS is an unsupervised deep
clustering framework for learning the parameters of multivariate Gaussian
cluster distributions by iteratively updating individual cluster weights. The
G-CEALS method presents average rank orderings of 2.9(1.7) and 2.8(1.7) based
on clustering accuracy and adjusted Rand index (ARI) scores on sixteen tabular
data sets, respectively, and outperforms nine state-of-the-art clustering
methods. G-CEALS substantially improves clustering performance compared to
traditional K-means and GMM, which are still de facto methods for clustering
tabular data. Similar computationally efficient and high-performing deep
clustering frameworks are imperative to reap the myriad benefits of deep
learning on tabular data over traditional machine learning. |
---|---|
DOI: | 10.48550/arxiv.2301.00802 |