Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection

Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final go...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 29; no. 3; pp. 408 - 422
Main Author Fatemeh Azmandian Member, IEEE, Ayse Yilmazer Student Member, IEEE, Jennifer G. Dy Member, IEEE Javed A. Aslam IEEE, Jennifer G. Dy Member, ACM David R. Kaeli Fellow, IEEE, Member, ACM
Format Journal Article
LanguageEnglish
Published Boston Springer US 01.05.2014
Springer Nature B.V
Department of Electrical and Computer Engineering, Northeastern University, Boston 02115-5096, U.S.A.%College of Computer and Information Science, Northeastern University, Boston 02115-5096, U.S.A
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.
Bibliography:11-2296/TP
Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.
feature selection, outlier detection, imbalanced data, GPU acceleration
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-014-1439-4