A two-stage intrusion detection method based on light gradient boosting machine and autoencoder

Intrusion detection systems can detect potential attacks and raise alerts on time. However, dimensionality curses and zero-day attacks pose challenges to intrusion detection systems. From a data perspective, the dimensionality curse leads to the low efficiency of intrusion detection systems. From th...

Full description

Saved in:

Bibliographic Details
Published in	Mathematical biosciences and engineering : MBE Vol. 20; no. 4; pp. 6966 - 6992
Main Authors	Zhang, Hao, Ge, Lina, Zhang, Guifen, Fan, Jingwei, Li, Denghui, Xu, Chenyang
Format	Journal Article
Language	English
Published	United States AIMS Press 01.01.2023
Subjects	cybersecurity feature selection focal loss intrusion detection systems machine learning intrusion detection systems machine learning cybersecurity feature selection focal loss
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Intrusion detection systems can detect potential attacks and raise alerts on time. However, dimensionality curses and zero-day attacks pose challenges to intrusion detection systems. From a data perspective, the dimensionality curse leads to the low efficiency of intrusion detection systems. From the attack perspective, the increasing number of zero-day attacks overwhelms the intrusion detection system. To address these problems, this paper proposes a novel detection framework based on light gradient boosting machine (LightGBM) and autoencoder. The recursive feature elimination (RFE) method is first used for dimensionality reduction in this framework. Then a focal loss (FL) function is introduced into the LightGBM classifier to boost the learning of difficult samples. Finally, a two-stage prediction step with LightGBM and autoencoder is performed. In the first stage, pre-decision is conducted with LightGBM. In the second stage, a residual is used to make a secondary decision for samples with a normal class. The experiments were performed on the NSL-KDD and UNSWNB15 datasets, and compared with the classical method. It was found that the proposed method is superior to other methods and reduces the time overhead. In addition, the existing advanced methods were also compared in this study, and the results show that the proposed method is above 90% for accuracy, recall, and F1 score on both datasets. It is further concluded that our method is valid when compared with other advanced techniques.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1551-0018 1551-0018
DOI:	10.3934/mbe.2023301