Enhanced Feature Selection Using Genetic Algorithm for Machine-Learning-Based Phishing URL Detection

In recent years, the importance of computer security has increased due to the rapid advancement of digital technology, widespread Internet use, and increased sophistication of cyberattacks. Machine learning has gained great interest in securing data systems because it offers the capability of automa...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 14; no. 14; p. 6081
Main Authors Kocyigit, Emre, Korkmaz, Mehmet, Sahingoz, Ozgur Koray, Diri, Banu
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In recent years, the importance of computer security has increased due to the rapid advancement of digital technology, widespread Internet use, and increased sophistication of cyberattacks. Machine learning has gained great interest in securing data systems because it offers the capability of automatically detecting and responding to security threats in real time, which is crucial for maintaining the security of computer systems and protecting data from malicious attacks. This study concentrates on phishing attack detection systems, a prevalent cyber-threat. These systems assess the features of the incoming requests to identify whether they are malicious or not. Although the number of features is increasing in these systems, feature selection has become an essential pre-processing phase that identifies the most important features of a set of available features to prevent overfitting problems, improve model performance, reduce computational cost, and decrease training and execution time. Leveraging genetic algorithms, known for simulating natural selection to identify optimal solutions, we propose a novel feature selection method, based on genetic algorithms and locally optimized, that is applied to a URL-based phishing detection system with machine learning models. Our research demonstrates that the proposed technique offers a promising strategy for improving the performance of machine learning models.
ISSN:2076-3417
2076-3417
DOI:10.3390/app14146081