Model-based clustering of high-dimensional data: A review

Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional space...

Full description

Saved in:

Bibliographic Details
Published in	Computational statistics & data analysis Vol. 71; pp. 52 - 78
Main Authors	Bouveyron, Charles, Brunet-Saumard, Camille
Format	Journal Article
Language	English
Published	Elsevier B.V 01.03.2014 Elsevier
Subjects	Clustering Computer programs data collection Data processing Dimension reduction Flexibility High-dimensional data Mathematics Model-based clustering Parsimonious models R package Regularization Software Statistics Statistics Theory Subspace clustering Variable selection High-dimensional data Dimension reduction Model-based clustering Parsimonious models Subspace clustering Software Regularization R package Variable selection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional spaces. This is mainly due to the fact that model-based clustering methods are dramatically over-parametrized in this case. However, high-dimensional spaces have specific characteristics which are useful for clustering and recent techniques exploit those characteristics. After having recalled the bases of model-based clustering, dimension reduction approaches, regularization-based techniques, parsimonious modeling, subspace clustering methods and clustering methods based on variable selection are reviewed. Existing softwares for model-based clustering of high-dimensional data will be also reviewed and their practical use will be illustrated on real-world data sets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0167-9473 1872-7352
DOI:	10.1016/j.csda.2012.12.008