K-Means and Related Clustering Methods
K-Means is arguably the most popular data analysis method. The method outputs a partition of the entity set into clusters and centroids representing them. It is very intuitive and usually requires just a few pages to get presented. This text includes a number of less popular subjects that are import...
Saved in:
Published in | Core Concepts in Data Analysis pp. 221 - 281 |
---|---|
Main Author | |
Format | Book Chapter |
Language | English |
Published |
United Kingdom
Springer London, Limited
2011
Springer London |
Series | Undergraduate Topics in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | K-Means is arguably the most popular data analysis method. The method outputs a partition of the entity set into clusters and centroids representing them. It is very intuitive and usually requires just a few pages to get presented. This text includes a number of less popular subjects that are important when using K-Means for real-world data analysis: Data standardization, especially, at mixed scales Innate tools for interpretation of clusters Analysis of examples of K-Means working and its failures Initialization – the choice of the number of clusters and location of centroids sVersions of K-Means such as incremental K-Means, nature inspired K-Means, and entity-centroid “medoid” methods are presented. Three modifications of K-Means onto different cluster structures are given:. Fuzzy K-Means for finding fuzzy clusters, Expectation-Maximization (EM) for finding probabilistic clusters, and Kohonen self-organizing maps (SOM) that tie up the sought clusters to a visually convenient two-dimensional grid. Equivalent reformulations of K-Means criterion are described – they can yield different algorithms for K-Means. One of these is explained at length: K-Means extends Principal component analysis to the case of binary scoring factors, which yields the so-called Anomalous cluster method, a key to an intelligent version of K-Means with automated choice of the number of clusters and their initialization. |
---|---|
ISBN: | 0857292862 9780857292865 |
ISSN: | 1863-7310 |
DOI: | 10.1007/978-0-85729-287-2_6 |