MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks

Identifying the family of malware can determine their malicious intent and attack patterns, which helps to efficiently analyze large numbers of malware variants. Methods based on traditional machine learning often require a lot of time and resources in feature engineering. Virtually all existing sta...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 141; pp. 49 - 58
Main Authors Xiao, Guoqing, Li, Jingning, Chen, Yuedan, Li, Kenli
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.07.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Identifying the family of malware can determine their malicious intent and attack patterns, which helps to efficiently analyze large numbers of malware variants. Methods based on traditional machine learning often require a lot of time and resources in feature engineering. Virtually all existing static analysis methods based on malware visualization are derived from grayscale images, while a single low-order feature representation may be detrimental to discovering hidden features in a malware family. Based on these problems, this paper proposes an effective malware classification framework (MalFCS) based on malware visualization and automated feature extraction. MalFCS includes mainly three modules: malware visualization, feature extraction, and classification. First, we visualize malware binaries as entropy graphs based on structural entropy. Second, we present a feature extractor based on deep convolutional neural networks to extract patterns shared by a family from entropy graphs automatically. Finally, we propose an SVM classifier to classify malware based on the extracted features. We evaluate the proposed MalFCS over two widely studied benchmark datasets, i.e., Malimg and Microsoft. Experimental results show that compared with the state-of-the-art methods, MalFCS can obtain excellent classification performance with accuracy of 0.997 and 1, respectively, achieving the state-of-the-art performance. •MalFCS integrates visualization, automated feature extraction and classification.•Visualizing malware as entropy graphs based on structural entropy.•Using convolutional neural networks to extract family patterns from entropy graphs.•Achieving an accuracy of 0.997 and 1 on Malimg and Microsoft dataset, respectively.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2020.03.012