Merging K‐means with hierarchical clustering for identifying general‐shaped groups

Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and K‐means clustering are two approaches but have different strengths and weaknesses. For instance, hierarchical clustering identifies groups in a tre...

Full description

Saved in:

Bibliographic Details
Published in	Stat (International Statistical Institute) Vol. 7; no. 1
Main Authors	Peterson, Anna D., Ghosh, Arka P., Maitra, Ranjan
Format	Journal Article
Language	English
Published	United States Wiley Subscription Services, Inc 2018
Subjects	Cluster analysis Clustering complete linkage Computer simulation Datasets distance measure Distance measurement hierarchical clustering Identification methods K‐means algorithm Partitions single linkage K-means algorithm hierarchical clustering complete linkage single linkage distance measure
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and K‐means clustering are two approaches but have different strengths and weaknesses. For instance, hierarchical clustering identifies groups in a tree‐like structure but suffers from computational complexity in large datasets, while K‐means clustering is efficient but designed to identify homogeneous spherically shaped clusters. We present a hybrid non‐parametric clustering approach that amalgamates the two methods to identify general‐shaped clusters and that can be applied to larger datasets. Specifically, we first partition the dataset into spherical groups using K‐means. We next merge these groups using hierarchical methods with a data‐driven distance measure as a stopping criterion. Our proposal has the potential to reveal groups with general shapes and structure in a dataset. We demonstrate good performance on several simulated and real datasets. Copyright © 2018 John Wiley & Sons, Ltd.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2049-1573 2049-1573
DOI:	10.1002/sta4.172