Outlier Detection and Explanation Method Based on FOLOF Algorithm

Outlier mining constitutes an essential aspect of modern data analytics, focusing on the identification and interpretation of anomalous observations. Conventional density-based local outlier detection methodologies frequently exhibit limitations due to their inherent lack of data preprocessing capab...

Full description

Saved in:
Bibliographic Details
Published inEntropy (Basel, Switzerland) Vol. 27; no. 6; p. 582
Main Authors Bai, Lei, Wang, Jiasheng, Zhou, Yu
Format Journal Article
LanguageEnglish
Published Switzerland MDPI AG 30.05.2025
MDPI
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Outlier mining constitutes an essential aspect of modern data analytics, focusing on the identification and interpretation of anomalous observations. Conventional density-based local outlier detection methodologies frequently exhibit limitations due to their inherent lack of data preprocessing capabilities, consequently demonstrating degraded performance when applied to novel or heterogeneous datasets. Moreover, the computation of the outlier factor for each sample in these algorithms results in considerably higher computational cost, especially in the case of large datasets. This paper introduces a local outlier detection method named FOLOF (FCM Objective Function-based LOF) through an examination of existing algorithms. The approach starts by applying the elbow rule to determine the optimal number of clusters in the dataset. Subsequently, the FCM objective function is employed to prune the dataset to extract a candidate set of outliers. Finally, a weighted local outlier factor detection algorithm computes the degree of anomaly for each sample in the candidate set. For the analysis, the Golden Section method was used to classify the outliers. The underlying causes of these outliers can be revealed by exploring the anomalous properties of each outlier data point through the outlier factors of each dimension property. This approach has been validated on artificial datasets, the UCI dataset, and an NBA player dataset to demonstrate its effectiveness.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1099-4300
1099-4300
DOI:10.3390/e27060582