Static detection of malicious PowerShell based on word embeddings
While traditional malware relies on executables to function, fileless malware resides in memory to evade traditional detection methods. PowerShell which is a legitimate management tool used by system administrators provides an ideal cover for attackers. Many studies attempted to detect unknown malwa...
Saved in:
Published in | Internet of things (Amsterdam. Online) Vol. 15; p. 100404 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | While traditional malware relies on executables to function, fileless malware resides in memory to evade traditional detection methods. PowerShell which is a legitimate management tool used by system administrators provides an ideal cover for attackers. Many studies attempted to detect unknown malware with machine learning techniques. However, there are a few studies for detecting malicious PowerShell. Previous studies proposed methods of detecting malicious PowerShell with deep neural networks. Previous methods require decoding obfuscated samples for dynamic code evaluation. Decoding obfuscated samples is a troublesome task and is often time consuming. Security devices such as intrusion detection system (IDS) or sandbox are located at a point that can monitor all inbound traffic. In general, this traffic contains too massive samples to analyze by dynamic analysis. Therefore, a light-weight static method is desirable. In addition, some studies use their private dataset to evaluate their methods. In this paper, we propose a static method of detecting malicious PowerShell based on word embeddings. In our method, PowerShell scripts are separated into words, and these words are used as features for machine learning techniques. We improved the feature extraction method by selecting frequent words. To provide reproducibility, we obtained thousands of samples from multiple websites which are publicly available. The best F1 score achieves 0.995 in practical environment, and achieves 0.985 in 5-fold cross-validation. Furthermore, we identified their malware families, and confirmed our method is effective to new ones. |
---|---|
ISSN: | 2542-6605 2542-6605 |
DOI: | 10.1016/j.iot.2021.100404 |