M3F: A novel multi-session and multi-protocol based malware traffic fingerprinting
In recent years, cyber attacks have become increasingly frequent, which has had a tremendous negative impact on public life and social order. Accurately and quickly finding malware traffic from massive network traffic is one of the keys to defending against network attacks. Traditional detection met...
Saved in:
Published in | Computer networks (Amsterdam, Netherlands : 1999) Vol. 227; p. 109723 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.05.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In recent years, cyber attacks have become increasingly frequent, which has had a tremendous negative impact on public life and social order. Accurately and quickly finding malware traffic from massive network traffic is one of the keys to defending against network attacks. Traditional detection methods, whether signature-based or machine learning-based, use a packet or a session as the smallest detection unit. Usually, malware creates more than one session while executing malicious functions. Detecting these sessions alone without contextual information is prone to false negatives and false positives.
To address these problems, we propose M3F, a Multi-session and Multi-protocol based Malware traffic Fingerprinting that uses multiple related sessions with different protocols as the smallest classification unit. We associate multiple sessions of different protocols together to form several session sequences. For each session in a session sequence, we represent it with a state with three features so that a session sequence can be transformed into a state sequence. For a malware family, we learn a first-order homogeneous Markov chain using its state sequences as its traffic fingerprint (aka M3F). M3Fs reflect the dynamics of malware communication traffic. Meanwhile, the approximate matching technique is used to deal with the evolution of malware, which can improve the recall rate. Armed with M3F, we can locate malware traffic in massive network traffic. We use a large amount of malicious traffic to verify M3F. The experimental results show that M3F has 99.41% precision and 99.54% recall, both outperforming the baseline methods. It does not mark normal traffic as malicious traffic. Additionally, M3F is well interpretable and is the network-level representation of the malware’s behavior. |
---|---|
ISSN: | 1389-1286 1872-7069 |
DOI: | 10.1016/j.comnet.2023.109723 |