Static Analysis for Malware Classification Using Machine and Deep Learning

Malware, or malicious software, is a general term to describe any program or code that can be harmful to systems. This hostile, intrusive, and intentionally harmful code makes use of a variety of techniques to protect and evade detection and removal through code obfuscation, polymorphism, metamorphi...

Full description

Saved in:
Bibliographic Details
Published in2023 XLIX Latin American Computer Conference (CLEI) pp. 1 - 10
Main Authors Salas, Marcelo Invert Palma, De Geus, Paulo
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Malware, or malicious software, is a general term to describe any program or code that can be harmful to systems. This hostile, intrusive, and intentionally harmful code makes use of a variety of techniques to protect and evade detection and removal through code obfuscation, polymorphism, metamorphism, encryption, encrypted communication, and more. Current state-of-the-art research focuses on the application of artificial intelligence techniques for the detection and classification of malware. In this context, this paper proposes a new malware classification through static analysis using seven machine learning algorithms (LightGBM, XGBoost, Logistic Regression, KNN, SVM, Naive Bayes, and Random Forest) and deep learning finetuning. These models make use of the SelectKBest technique within data engineering, allowing the selection of the 893 most relevant characteristics for the classification of 10868 malware in 9 families, reducing overfitting and training time. The results show that the application of Gradient Boosting algorithms such as LightGBM with hyperparameter optimization exceeds the reference results in competitions such as Kaggle, with a logarithmic loss 0.00118, an accuracy close to 100%, and prediction times less than 2.3ms. Fast enough to be applied to systems in real time to classify malware.
ISSN:2771-5752
DOI:10.1109/CLEI60451.2023.10346179