A Taxonomy of Visual Cluster Separation Factors

We provide two contributions, a taxonomy of visual cluster separation factors in scatterplots, and an in‐depth qualitative evaluation of two recently proposed and validated separation measures. We initially intended to use these measures to provide guidance for the use of dimension reduction (DR) te...

Full description

Saved in:

Bibliographic Details
Published in	Computer graphics forum Vol. 31; no. 3pt4; pp. 1335 - 1344
Main Authors	Sedlmair, M., Tatu, A., Munzner, T., Tory, M.
Format	Journal Article
Language	English
Published	Oxford, UK Blackwell Publishing Ltd 01.06.2012
Subjects	Analysis Categories Clusters Computer graphics Curvature Datasets Encoding Failure H.5.0 [Information Interfaces and Presentation]: General J.0 [Computer Applications]: General Scatter diagrams Separation Studies Taxonomy Visual
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We provide two contributions, a taxonomy of visual cluster separation factors in scatterplots, and an in‐depth qualitative evaluation of two recently proposed and validated separation measures. We initially intended to use these measures to provide guidance for the use of dimension reduction (DR) techniques and visual encoding (VE) choices, but found that they failed to produce reliable results. To understand why, we conducted a systematic qualitative data study covering a broad collection of 75 real and synthetic high‐dimensional datasets, four DR techniques, and three scatterplot‐based visual encodings. Two authors visually inspected over 800 plots to determine whether or not the measures created plausible results. We found that they failed in over half the cases overall, and in over two‐thirds of the cases involving real datasets. Using open and axial coding of failure reasons and separability characteristics, we generated a taxonomy of visual cluster separability factors. We iteratively refined its explanatory clarity and power by mapping the studied datasets and success and failure ranges of the measures onto the factor axes. Our taxonomy has four categories, ordered by their ability to influence successors: Scale, Point Distance, Shape, and Position. Each category is split into Within‐Cluster factors such as density, curvature, isotropy, and clumpiness, and Between‐Cluster factors that arise from the variance of these properties, culminating in the overarching factor of class separation. The resulting taxonomy can be used to guide the design and the evaluation of cluster separation measures.
Bibliography:	ark:/67375/WNG-0Q0W0M35-P istex:92531515E8ABAEFB249A539355A9F5F7AFD14592 ArticleID:CGF3125 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0167-7055 1467-8659
DOI:	10.1111/j.1467-8659.2012.03125.x