Static detection of malicious PowerShell based on word embeddings

While traditional malware relies on executables to function, fileless malware resides in memory to evade traditional detection methods. PowerShell which is a legitimate management tool used by system administrators provides an ideal cover for attackers. Many studies attempted to detect unknown malwa...

Full description

Saved in:

Bibliographic Details
Published in	Internet of things (Amsterdam. Online) Vol. 15; p. 100404
Main Authors	Mimura, Mamoru, Tajiri, Yui
Format	Journal Article
Language	English
Published	Elsevier B.V 01.09.2021
Subjects	Doc2vec Latent Semantic Indexing PowerShell XGBoost PowerShell Latent Semantic Indexing XGBoost Doc2vec
Online Access	Get full text

Cover

Loading…

More Information
Summary:	While traditional malware relies on executables to function, fileless malware resides in memory to evade traditional detection methods. PowerShell which is a legitimate management tool used by system administrators provides an ideal cover for attackers. Many studies attempted to detect unknown malware with machine learning techniques. However, there are a few studies for detecting malicious PowerShell. Previous studies proposed methods of detecting malicious PowerShell with deep neural networks. Previous methods require decoding obfuscated samples for dynamic code evaluation. Decoding obfuscated samples is a troublesome task and is often time consuming. Security devices such as intrusion detection system (IDS) or sandbox are located at a point that can monitor all inbound traffic. In general, this traffic contains too massive samples to analyze by dynamic analysis. Therefore, a light-weight static method is desirable. In addition, some studies use their private dataset to evaluate their methods. In this paper, we propose a static method of detecting malicious PowerShell based on word embeddings. In our method, PowerShell scripts are separated into words, and these words are used as features for machine learning techniques. We improved the feature extraction method by selecting frequent words. To provide reproducibility, we obtained thousands of samples from multiple websites which are publicly available. The best F1 score achieves 0.995 in practical environment, and achieves 0.985 in 5-fold cross-validation. Furthermore, we identified their malware families, and confirmed our method is effective to new ones.
ISSN:	2542-6605 2542-6605
DOI:	10.1016/j.iot.2021.100404