Handling class Imbalance problem in Intrusion Detection System based on deep learning

Network intrusion detection system(NIDS) is the most used tool to detect malicious network activities. The NIDS has achieved in the recent years promising results for detecting known and novel attacks, with the adoption of deep learning. However, these NIDSs still have shortcomings. Most of the data...

Full description

Saved in:
Bibliographic Details
Published inInternational Journal of Networking and Computing Vol. 12; no. 2; pp. 467 - 492
Main Authors Mbow, Mariama, Koide, Hiroshi, Sakurai, Kouichi
Format Journal Article
LanguageEnglish
Published IJNC Editorial Committee 2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Network intrusion detection system(NIDS) is the most used tool to detect malicious network activities. The NIDS has achieved in the recent years promising results for detecting known and novel attacks, with the adoption of deep learning. However, these NIDSs still have shortcomings. Most of the datasets used for NIDS are highly imbalanced, where the number of samples that belong to normal traffic is much larger than the attack traffic. The problem of imbalanced class skews the results. It limits the deep learning classifier’s performance for minority classes by misleading the classifier to be biased in favor of the majority class. To improve the detection rate for minority classes while ensuring efficiency, this study proposes a hybrid approach to handle the imbalance problem. This hybrid approach is a combination of oversampling with Synthetic Minority Over-Sampling (SMOTE) and Tomek link, an under-sampling method to reduce noise. Additionally, this study uses two deep learning models such as Long Short-Term Memory Network (LSTM) and Convolutional Neural Network (CNN) to provide a better intrusion detection system. The advantage of our proposed model is tested in NSL-KDD, CICIDS2017 datasets. In addition, we evaluate the method in the most recent intrusion detection dataset, CICIDS2018 dataset. We use 10-fold cross validation in this work to train the learning models and an independent test set for evaluation. The experimental results show that in the multi-class classification with NSLKDD dataset, the proposed model reached an overall accuracy and Fscore of 99% and 99.0.2% respectively on LSTM, an overall accuracy and Fscore of 99.70% and 99.27% respectively for CNN. And with CICIDS2017 an overall ac- curacy and Fscore of 99.65% and 98 % respectively on LSTM, an overall accuracy and Fscore of 99.85% and 98.98% respectively for CNN. In CICIDS2018 the proposed method achieved an overall detection rate and Fscore of 95% and 94% respectively.
ISSN:2185-2839
2185-2847
DOI:10.15803/ijnc.12.2_467