MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks
Identifying the family of malware can determine their malicious intent and attack patterns, which helps to efficiently analyze large numbers of malware variants. Methods based on traditional machine learning often require a lot of time and resources in feature engineering. Virtually all existing sta...
Saved in:
Published in | Journal of parallel and distributed computing Vol. 141; pp. 49 - 58 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.07.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Identifying the family of malware can determine their malicious intent and attack patterns, which helps to efficiently analyze large numbers of malware variants. Methods based on traditional machine learning often require a lot of time and resources in feature engineering. Virtually all existing static analysis methods based on malware visualization are derived from grayscale images, while a single low-order feature representation may be detrimental to discovering hidden features in a malware family. Based on these problems, this paper proposes an effective malware classification framework (MalFCS) based on malware visualization and automated feature extraction. MalFCS includes mainly three modules: malware visualization, feature extraction, and classification. First, we visualize malware binaries as entropy graphs based on structural entropy. Second, we present a feature extractor based on deep convolutional neural networks to extract patterns shared by a family from entropy graphs automatically. Finally, we propose an SVM classifier to classify malware based on the extracted features. We evaluate the proposed MalFCS over two widely studied benchmark datasets, i.e., Malimg and Microsoft. Experimental results show that compared with the state-of-the-art methods, MalFCS can obtain excellent classification performance with accuracy of 0.997 and 1, respectively, achieving the state-of-the-art performance.
•MalFCS integrates visualization, automated feature extraction and classification.•Visualizing malware as entropy graphs based on structural entropy.•Using convolutional neural networks to extract family patterns from entropy graphs.•Achieving an accuracy of 0.997 and 1 on Malimg and Microsoft dataset, respectively. |
---|---|
ISSN: | 0743-7315 1096-0848 |
DOI: | 10.1016/j.jpdc.2020.03.012 |