基于角度方差的多层次高维数据异常检测算法

异常检测一直是数据挖掘领域的重要工作之一。基于欧氏距离的异常检测算法在应用于高维数据时存在检测精度无法保证和运行时间过长的问题。在基于角度方差的异常检测算法基础上,提出了一种多层次的高维数据异常检测算法(hybrid outlier detection algorithm based on angle variance for high-dimensional data,HODA)。算法结合了粗糙集理论,分析属性之间的相互作用以排除影响较小的属性;通过分析各维度上的数据分布,对数据进行网格划分,寻找可能存在异常点的网格;最后对可能存在异常点的网格计算角度方差异常因子,筛选异常数据。实验结果表明...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 33; no. 11; pp. 3383 - 3386
Main Author 陈圣楠 钱红燕 李伟
Format Journal Article
LanguageChinese
Published 中国民航大学中国民航信息技术科研基地,天津300300 2016
南京航空航天大学计算机科学与技术学院,南京,210016%南京航空航天大学计算机科学与技术学院,南京210016
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001--3695.2016.11.040

Cover

More Information
Summary:异常检测一直是数据挖掘领域的重要工作之一。基于欧氏距离的异常检测算法在应用于高维数据时存在检测精度无法保证和运行时间过长的问题。在基于角度方差的异常检测算法基础上,提出了一种多层次的高维数据异常检测算法(hybrid outlier detection algorithm based on angle variance for high-dimensional data,HODA)。算法结合了粗糙集理论,分析属性之间的相互作用以排除影响较小的属性;通过分析各维度上的数据分布,对数据进行网格划分,寻找可能存在异常点的网格;最后对可能存在异常点的网格计算角度方差异常因子,筛选异常数据。实验结果表明,与ABOD、Fast VOA和经典LOF算法相比,HODA算法在保证精测精度的前提下,运行时间显著缩短,且可扩展性强。
Bibliography:51-1196/TP
Chen Shengnan1, Qian Hongyan1,2, Li Wei1 ( 1. College of Computer Science & Technology, Nanjing University of Aeronautics & Astronautics, Nanjing 210016, China; 2. Information Tech- nology Research Base of Civil Aviation Administration of China, Civil Aviation University of China, Tianjin 300300, China)
high-dimensional data; outlier detection ; dimensional reduction; grid; angle variance
Outlier detection is a major task of data mining. Outlier detection methods based on Euclidean distances are not ca- pable for high-dimensional data because they can hardly ensure the cost of the computation and the accuracy. After analyzing angle-based outlier detection method, this paper proposed a novel approach called hybrid outlier detection algorithm based .on angle variance for high-dimensional data. The algorithm first utilized rough set theory to analyze the impact between the attri- butes and abandoned less important ones. Then it divided data into different cubes according to the distribution of data on ev
ISSN:1001-3695
DOI:10.3969/j.issn.1001--3695.2016.11.040