Summary Statistics for Partitionings and Feature Allocations

Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitioning...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Işık Barış Fidaner, Cemgil, Ali Taylan
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 25.11.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitionings. In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. We develop an element-based definition of entropy to quantify segmentation among their elements. Then we propose a simple algorithm called entropy agglomeration (EA) to summarize and visualize this information. Experiments on various infinite mixture posteriors as well as a feature allocation dataset demonstrate that the proposed statistics are useful in practice.
ISSN:2331-8422