Novel active learning methods for enhanced PC malware detection in windows OS

•The challenge of malware signature update is formalized as an active learning task.•We present and compare several active learning (AL) strategies.•The best results are achieved using our AL method called Exploitation.•With our AL methods the number of malwares acquired daily is increased substanti...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 41; no. 13; pp. 5843 - 5857
Main Authors Nissim, Nir, Moskovitch, Robert, Rokach, Lior, Elovici, Yuval
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier Ltd 01.10.2014
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•The challenge of malware signature update is formalized as an active learning task.•We present and compare several active learning (AL) strategies.•The best results are achieved using our AL method called Exploitation.•With our AL methods the number of malwares acquired daily is increased substantially.•AL methods improve the predictive performance of malware detectors. The formation of new malwares every day poses a significant challenge to anti-virus vendors since antivirus tools, using manually crafted signatures, are only capable of identifying known malware instances and their relatively similar variants. To identify new and unknown malwares for updating their anti-virus signature repository, anti-virus vendors must daily collect new, suspicious files that need to be analyzed manually by information security experts who then label them as malware or benign. Analyzing suspected files is a time-consuming task and it is impossible to manually analyze all of them. Consequently, anti-virus vendors use machine learning algorithms and heuristics in order to reduce the number of suspect files that must be inspected manually. These techniques, however, lack an essential element – they cannot be daily updated. In this work we introduce a solution for this updatability gap. We present an active learning (AL) framework and introduce two new AL methods that will assist anti-virus vendors to focus their analytical efforts by acquiring those files that are most probably malicious. Those new AL methods are designed and oriented towards new malware acquisition. To test the capability of our methods for acquiring new malwares from a stream of unknown files, we conducted a series of experiments over a ten-day period. A comparison of our methods to existing high performance AL methods and to random selection, which is the naïve method, indicates that the AL methods outperformed random selection for all performance measures. Our AL methods outperformed existing AL method in two respects, both related to the number of new malwares acquired daily, the core measure in this study. First, our best performing AL method, termed “Exploitation”, acquired on the 9th day of the experiment about 2.6 times more malwares than the existing AL method and 7.8 more times than the random selection. Secondly, while the existing AL method showed a decrease in the number of new malwares acquired over 10days, our AL methods showed an increase and a daily improvement in the number of new malwares acquired. Both results point towards increased efficiency that can possibly assist anti-virus vendors.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2014.02.053