Robust Bayesian cluster enumeration based on the t distribution

•Novel robust cluster enumeration criterion derived using Bayes theorem.•Maximizes posterior probability among t-distributed candidate models.•Penalty term without asymptotic approximations derived for finite sample sizes.•Two-step robust clustering and enumeration algorithm proposed.•Successful rea...

Full description

Saved in:

Bibliographic Details
Published in	Signal processing Vol. 182; p. 107870
Main Authors	Teklehaymanot, Freweyni K., Muma, Michael, Zoubir, Abdelhak M.
Format	Journal Article
Language	English
Published	Elsevier B.V 01.05.2021
Subjects	Bayesian Information Criterion Cluster analysis Cluster Enumeration Multivariate [formula omitted] distribution Outlier Robust Outlier Cluster analysis Bayesian Information Criterion Cluster Enumeration Multivariate tν distribution Robust
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Novel robust cluster enumeration criterion derived using Bayes theorem.•Maximizes posterior probability among t-distributed candidate models.•Penalty term without asymptotic approximations derived for finite sample sizes.•Two-step robust clustering and enumeration algorithm proposed.•Successful real-data application and benchmarking against existing methods. A major challenge in cluster analysis is that the number of data clusters is mostly unknown and it must be estimated prior to clustering the observed data. In real-world applications, the observed data is often subject to heavy tailed noise and outliers which obscure the true underlying structure of the data. Consequently, estimating the number of clusters becomes challenging. To this end, we derive a robust cluster enumeration criterion by formulating the problem of estimating the number of clusters as maximization of the posterior probability of multivariate tν distributed candidate models. We utilize Bayes’ theorem and asymptotic approximations to come up with a robust criterion that possesses a closed-form expression. Further, we refine the derivation and provide a robust cluster enumeration criterion for data sets with finite sample size. The robust criteria require an estimate of cluster parameters for each candidate model as an input. Hence, we propose a two-step cluster enumeration algorithm that uses the expectation maximization algorithm to partition the data and estimate cluster parameters prior to the calculation of one of the robust criteria. The performance of the proposed algorithm is tested and compared to existing cluster enumeration methods using numerical and real data experiments.
ISSN:	0165-1684 1872-7557
DOI:	10.1016/j.sigpro.2020.107870