Simultaneous Gaussian model-based clustering for samples of multiple origins

Gaussian mixture model-based clustering is now a standard tool to estimate some hypothetical underlying partition of a single dataset. In this paper, we aim to cluster several different datasets at the same time in a context where underlying populations, even though different, are not completely unr...

Full description

Saved in:
Bibliographic Details
Published inComputational statistics Vol. 28; no. 1; pp. 371 - 391
Main Authors Lourme, A., Biernacki, C.
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer-Verlag 01.02.2013
Springer Nature B.V
Springer Verlag
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Gaussian mixture model-based clustering is now a standard tool to estimate some hypothetical underlying partition of a single dataset. In this paper, we aim to cluster several different datasets at the same time in a context where underlying populations, even though different, are not completely unrelated: All individuals are described by the same features and partitions of identical meaning are expected. Justifying from some natural arguments a stochastic linear link between the components of the mixtures associated to each dataset, we propose some parsimonious and meaningful models for a so-called simultaneous clustering method. Maximum likelihood mixture parameters, subject to the linear link constraint, can be easily estimated by a Generalized Expectation Maximization algorithm that we describe. Some promising results are obtained in a biological context where simultaneous clustering outperforms independent clustering for partitioning three different subspecies of birds. Further results on ornithological data show that the proposed strategy is robust to the relaxation of the exact descriptor concordance which is one of its main assumptions.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0943-4062
1613-9658
DOI:10.1007/s00180-012-0305-5