Detecting Internet of Things Malware on Evidence Generation

Malware has been a real threat to Internet of Things (IoT). Although commercial antivirus solutions can detect malware files and provide label information indicating malware types or families, no clear evidence explaining the detection is provided. Therefore, even security experts using the antiviru...

Full description

Saved in:

Bibliographic Details
Published in	IEEE internet of things journal Vol. 11; no. 22; pp. 36950 - 36964
Main Authors	Han, YoonSeok, Seo, HyungBin, Yoon, MyungKeun
Format	Journal Article
Language	English
Published	Piscataway IEEE 15.11.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Anti-virus software Bipartite graph Clustering Cybersecurity Datasets Deep learning Domain Name System Graph theory Internet of Things Internet of Things (IoT) Malware Security signature generation Strings Uniqueness
Online Access	Get full text
ISSN	2327-4662 2327-4662
DOI	10.1109/JIOT.2024.3439528

Cover

More Information
Summary:	Malware has been a real threat to Internet of Things (IoT). Although commercial antivirus solutions can detect malware files and provide label information indicating malware types or families, no clear evidence explaining the detection is provided. Therefore, even security experts using the antivirus solutions do not know why some files are reported malicious and they hesitate to take an immediate action. In this article, we study this problem from the viewpoint of antivirus solution users instead of product developers or sellers. We present a new data-driven scheme that can automatically generate a set of readable common strings from the IoT malware files as a detection evidence. These generated string signatures not only provide a clear detection evidence for suspicious files but can also be used as unique high-precision detection criteria. The new data-driven scheme divides any long evasive string embedded in malware files into short n-grams to mitigate the detection evasion, and a limited number of n-grams are selected as representative n-grams on a bipartite graph that improves the efficiency and accuracy of clustering. A set of n-grams per cluster, which plays the role of an unique detection evidence is generated. Through experiments with the real malware data sets, including the public data sets for the experimental reproducibility, we confirm that the new data-driven scheme not only detects malware files as accurately as the current state-of-the-art (SOTA), especially no benign files mistakenly considered as malicious but also provides readable strings as a detection evidence, which has not been achieved by the previous work.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2327-4662 2327-4662
DOI:	10.1109/JIOT.2024.3439528