Identifying the Author Group of Malwares through Graph Embedding and Human-in-the-Loop Classification

Malware are developed for various types of malicious attacks, e.g., to gain access to a user’s private information or control of the computer system. The identification and classification of malware has been extensively studied in academic societies and many companies. Beyond the traditional researc...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 11; no. 14; p. 6640
Main Authors Chae, Dong-Kyu, Park, Sung-Jun, Kim, Eujeanne, Hong, Jiwon, Kim, Sang-Wook
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.07.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Malware are developed for various types of malicious attacks, e.g., to gain access to a user’s private information or control of the computer system. The identification and classification of malware has been extensively studied in academic societies and many companies. Beyond the traditional research areas in this field, including malware detection, malware propagation analysis, and malware family clustering, this paper focuses on identifying the “author group” of a given malware as a means of effective detection and prevention of further malware threats, along with providing evidence for proper legal action. Our framework consists of a malware-feature bipartite graph construction, malware embedding based on DeepWalk, and classification of the target malware based on the k-nearest neighbors (KNN) classification. However, our KNN classifier often faced ambiguous cases, where it should say “I don’t know” rather than attempting to predict something with a high risk of misclassification. Therefore, our framework allows human experts to intervene in the process of classification for the final decision. We also developed a graphical user interface that provides the points of ambiguity for helping human experts to effectively determine the author group of the target malware. We demonstrated the effectiveness of our human-in-the-loop classification framework via extensive experiments using real-world malware data.
ISSN:2076-3417
2076-3417
DOI:10.3390/app11146640