Extended Isolation Forest for Intrusion Detection in Zeek Data

The novelty of this paper is in determining and using hyperparameters to improve the Extended Isolation Forest (EIF) algorithm, a relatively new algorithm, to detect malicious activities in network traffic. The EIF algorithm is a variation of the Isolation Forest algorithm, known for its efficacy in...

Full description

Saved in:
Bibliographic Details
Published inInformation (Basel) Vol. 15; no. 7; p. 404
Main Authors Moomtaheen, Fariha, Bagui, Sikha S., Bagui, Subhash C., Mink, Dustin
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The novelty of this paper is in determining and using hyperparameters to improve the Extended Isolation Forest (EIF) algorithm, a relatively new algorithm, to detect malicious activities in network traffic. The EIF algorithm is a variation of the Isolation Forest algorithm, known for its efficacy in detecting anomalies in high-dimensional data. Our research assesses the performance of the EIF model on a newly created dataset composed of Zeek Connection Logs, UWF-ZeekDataFall22. To handle the enormous volume of data involved in this research, the Hadoop Distributed File System (HDFS) is employed for efficient and fault-tolerant storage, and the Apache Spark framework, a powerful open-source Big Data analytics platform, is utilized for machine learning (ML) tasks. The best results for the EIF algorithm came from the 0-extension level. We received an accuracy of 82.3% for the Resource Development tactic, 82.21% for the Reconnaissance tactic, and 78.3% for the Discovery tactic.
ISSN:2078-2489
2078-2489
DOI:10.3390/info15070404