K-Means and Related Clustering Methods

K-Means is arguably the most popular data analysis method. The method outputs a partition of the entity set into clusters and centroids representing them. It is very intuitive and usually requires just a few pages to get presented. This text includes a number of less popular subjects that are import...

Full description

Saved in:

Bibliographic Details
Published in	Core Concepts in Data Analysis pp. 221 - 281
Main Author	Mirkin, Boris
Format	Book Chapter
Language	English
Published	United Kingdom Springer London, Limited 2011 Springer London
Series	Undergraduate Topics in Computer Science
Subjects	Artificial intelligence Cluster Centroid Company Data Data Scatter Discrete mathematics Fuzzy Cluster Gravity Center Maths for computer scientists
Online Access	Get full text

Cover

Loading…

More Information
Summary:	K-Means is arguably the most popular data analysis method. The method outputs a partition of the entity set into clusters and centroids representing them. It is very intuitive and usually requires just a few pages to get presented. This text includes a number of less popular subjects that are important when using K-Means for real-world data analysis: Data standardization, especially, at mixed scales Innate tools for interpretation of clusters Analysis of examples of K-Means working and its failures Initialization – the choice of the number of clusters and location of centroids sVersions of K-Means such as incremental K-Means, nature inspired K-Means, and entity-centroid “medoid” methods are presented. Three modifications of K-Means onto different cluster structures are given:. Fuzzy K-Means for finding fuzzy clusters, Expectation-Maximization (EM) for finding probabilistic clusters, and Kohonen self-organizing maps (SOM) that tie up the sought clusters to a visually convenient two-dimensional grid. Equivalent reformulations of K-Means criterion are described – they can yield different algorithms for K-Means. One of these is explained at length: K-Means extends Principal component analysis to the case of binary scoring factors, which yields the so-called Anomalous cluster method, a key to an intelligent version of K-Means with automated choice of the number of clusters and their initialization.
ISBN:	0857292862 9780857292865
ISSN:	1863-7310
DOI:	10.1007/978-0-85729-287-2_6