Novel clustering-based approach for Local Outlier Detection

With the rapid expansion of data scale, big data mining and analysis have attracted increasing attention. Outlier detection as an important task of data mining is widely used in many applications. However, conventional outlier detection methods have difficulty handling large-scale datasets. In addit...

Full description

Saved in:

Bibliographic Details
Published in	2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) pp. 802 - 811
Main Authors	Haizhou Du, Shengjie Zhao, Daqiang Zhang, Jinsong Wu
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2016
Subjects	Big data Chebyshev approximation Clustering-based Data mining Electronic mail Outlier detection Robustness Statistical analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the rapid expansion of data scale, big data mining and analysis have attracted increasing attention. Outlier detection as an important task of data mining is widely used in many applications. However, conventional outlier detection methods have difficulty handling large-scale datasets. In addition, most of them typically can only identify global outliers and are over sensitive to parameters variation. In this paper, we propose a novel method for robust local outlier detection with statistical parameters, which incorporates the clustering-based ideas in dealing with big data. Firstly, this method finds some density peaks of dataset by 3σ standard. Secondly, each remaining data object in the dataset is assigned to the same cluster as its nearest neighbor of higher density. Finally, we use Chebyshev's inequality and density peak reachability to identify local outliers of each group. The experimental results demonstrate the efficiency and accuracy of the proposed method in identifying both global and local outliers. Moreover, the method is also proved to be more stability analysis than typical outlier detection methods, such as LOF (Local Outlier Factor) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise).
DOI:	10.1109/INFCOMW.2016.7562187