Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results

Clustering algorithms are widely used in the knowledge discovery domain, but concerns and questions about the validity of the results must be considered. The datasets commonly used for clustering tasks are often large and scale-free, making conventional statistical techniques inadequate for analyzin...

Full description

Saved in:
Bibliographic Details
Published inIEEE open journal of the Computer Society Vol. 6; pp. 438 - 448
Main Authors Elnour, Ammar, Yang, Wencheng, Li, Yan
Format Journal Article
LanguageEnglish
Published New York IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Clustering algorithms are widely used in the knowledge discovery domain, but concerns and questions about the validity of the results must be considered. The datasets commonly used for clustering tasks are often large and scale-free, making conventional statistical techniques inadequate for analyzing result uncertainty. This issue applies to most outcomes obtained from other knowledge discovery techniques, such as machine learning and statistical learning. Traditional statistical methods assume data follows standard distributions, whereas resampling and bootstrapping methods offer more accurate and reliable alternatives. This article introduces a method that employs bootstrap likelihood estimation to infer the uncertainty of generated clustering structures. We first calculated the clustering error in the original dataset and then utilized the proposed method to estimate its nonparametric bootstrapped likelihood. By comparing these two values, we can establish a nonparametric significance testing framework that directly determines the validity of the result. To evaluate the effectiveness of our method, we conducted experiments using synthetic and real datasets. The results demonstrate that our method can successfully validate clustering results.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2644-1268
2644-1268
DOI:10.1109/OJCS.2025.3545261