LogShield: A Transformer-based APT Detection System Leveraging Self-Attention
Cyber attacks are often identified using system and network logs. There have been significant prior works that utilize provenance graphs and ML techniques to detect attacks, specifically advanced persistent threats, which are very difficult to detect. Lately, there have been studies where transforme...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
09.11.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Cyber attacks are often identified using system and network logs. There have
been significant prior works that utilize provenance graphs and ML techniques
to detect attacks, specifically advanced persistent threats, which are very
difficult to detect. Lately, there have been studies where transformer-based
language models are being used to detect various types of attacks from system
logs. However, no such attempts have been made in the case of APTs. In
addition, existing state-of-the-art techniques that use system provenance
graphs, lack a data processing framework generalized across datasets for
optimal performance. For mitigating this limitation as well as exploring the
effectiveness of transformer-based language models, this paper proposes
LogShield, a framework designed to detect APT attack patterns leveraging the
power of self-attention in transformers. We incorporate customized embedding
layers to effectively capture the context of event sequences derived from
provenance graphs. While acknowledging the computational overhead associated
with training transformer networks, our framework surpasses existing LSTM and
Language models regarding APT detection. We integrated the model parameters and
training procedure from the RoBERTa model and conducted extensive experiments
on well-known APT datasets (DARPA OpTC and DARPA TC E3). Our framework achieved
superior F1 scores of 98% and 95% on the two datasets respectively, surpassing
the F1 scores of 96% and 94% obtained by LSTM models. Our findings suggest that
LogShield's performance benefits from larger datasets and demonstrates its
potential for generalization across diverse domains. These findings contribute
to the advancement of APT attack detection methods and underscore the
significance of transformer-based architectures in addressing security
challenges in computer systems. |
---|---|
DOI: | 10.48550/arxiv.2311.05733 |