One-step unsupervised clustering based on information theoretic metric and adaptive neighbor manifold regularization

Graph-based clustering is a basic subject in the field of machine learning, but most of them still have the following deficiencies. First, the extra discretization procedures leads to instability of the algorithm. In addition, the traditional method of constructing similarity graphs is based on the...

Full description

Saved in:
Bibliographic Details
Published inEngineering applications of artificial intelligence Vol. 120; p. 105880
Main Authors Li, Xinyu, Fan, Hui, Liu, Jinglei
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Graph-based clustering is a basic subject in the field of machine learning, but most of them still have the following deficiencies. First, the extra discretization procedures leads to instability of the algorithm. In addition, the traditional method of constructing similarity graphs is based on the pairwise distance, so it is extremely sensitive to the original data, and also lacks specific physical meaning from the perspective of probabilistic prediction. Final, the traditional metrics based on Euclidean distance is difficult to tackle non-Gaussian noise. In order to eliminate these limitations, a one-step unsupervised clustering based on information theoretic metric and adaptive neighbor manifold regularization method (ITMNMR) is proposed. (1) The clustering results are directly obtained according to the constructed similarity graph, avoiding extra discretization procedures; (2) A maximum entropy regularization term is introduced into the probabilistic model to avoid trivial similarity distributions. Furthermore, we introduce a Laplacian rank constraint and ℓ0-norm to construct adaptive neighbors with sparsity and strength segmentation capabilities; (3) To overcome the impression of noise, reconstruction based on correntropy is introduced to solve the non-Gaussian noise, and graph regularization is performed based on clean data. Furthermore, a half-quadratic optimization method is used to transform the problem into a quadratic form to facilitate subsequent solutions. Finally, our empirical study shows encouraging results of ITMNMR in comparison to classical algorithms and the state-of-the-art algorithms on 9 datasets. The robustness of the proposed method is also demonstrated from three experiments of adding Laplacian noise, salt&pepper noise, and block occlusion. [Display omitted] •Avoid additional discretization steps by generating clustering labels from similarity graph.•A maximum entropy regularization term is introduced to avoid trivial similarity distributions.•Laplacian rank constraint and ℓ0-norm is introduce to construct similarity graph.•Correntropy is introduced to solve the non-Gaussian noise and heavy tail noise.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2023.105880