MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks

Identifying the family of malware can determine their malicious intent and attack patterns, which helps to efficiently analyze large numbers of malware variants. Methods based on traditional machine learning often require a lot of time and resources in feature engineering. Virtually all existing sta...

Full description

Saved in:

Bibliographic Details
Published in	Journal of parallel and distributed computing Vol. 141; pp. 49 - 58
Main Authors	Xiao, Guoqing, Li, Jingning, Chen, Yuedan, Li, Kenli
Format	Journal Article
Language	English
Published	Elsevier Inc 01.07.2020
Subjects	Deep learning Feature extraction Information security Malware classification Malware visualization Deep learning Malware visualization Feature extraction Malware classification Information security
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Identifying the family of malware can determine their malicious intent and attack patterns, which helps to efficiently analyze large numbers of malware variants. Methods based on traditional machine learning often require a lot of time and resources in feature engineering. Virtually all existing static analysis methods based on malware visualization are derived from grayscale images, while a single low-order feature representation may be detrimental to discovering hidden features in a malware family. Based on these problems, this paper proposes an effective malware classification framework (MalFCS) based on malware visualization and automated feature extraction. MalFCS includes mainly three modules: malware visualization, feature extraction, and classification. First, we visualize malware binaries as entropy graphs based on structural entropy. Second, we present a feature extractor based on deep convolutional neural networks to extract patterns shared by a family from entropy graphs automatically. Finally, we propose an SVM classifier to classify malware based on the extracted features. We evaluate the proposed MalFCS over two widely studied benchmark datasets, i.e., Malimg and Microsoft. Experimental results show that compared with the state-of-the-art methods, MalFCS can obtain excellent classification performance with accuracy of 0.997 and 1, respectively, achieving the state-of-the-art performance. •MalFCS integrates visualization, automated feature extraction and classification.•Visualizing malware as entropy graphs based on structural entropy.•Using convolutional neural networks to extract family patterns from entropy graphs.•Achieving an accuracy of 0.997 and 1 on Malimg and Microsoft dataset, respectively.
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2020.03.012