HSTF-Model: An HTTP-based Trojan detection model via the Hierarchical Spatio-temporal Features of Traffics

HTTP-based Trojan is extremely threatening, and it is difficult to be effectively detected because of its concealment and confusion. Previous detection methods usually are with poor generalization ability due to outdated datasets and reliance on manual feature extraction, which makes these methods a...

Full description

Saved in:
Bibliographic Details
Published inComputers & security Vol. 96; p. 101923
Main Authors Xie, Jiang, Li, Shuhao, Yun, Xiaochun, Zhang, Yongzheng, Chang, Peng
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier Ltd 01.09.2020
Elsevier Sequoia S.A
Subjects
Online AccessGet full text
ISSN0167-4048
1872-6208
DOI10.1016/j.cose.2020.101923

Cover

Loading…
More Information
Summary:HTTP-based Trojan is extremely threatening, and it is difficult to be effectively detected because of its concealment and confusion. Previous detection methods usually are with poor generalization ability due to outdated datasets and reliance on manual feature extraction, which makes these methods always perform well under their private dataset, but poorly or even fail to work in real network environment. In this paper, we propose an HTTP-based Trojan detection model via the Hierarchical Spatio-Temporal Features of traffics (HSTF-Model) based on the formalized description of traffic spatio-temporal behavior from both packet level and flow level. In this model, we employ Convolutional Neural Network (CNN) to extract spatial information and Long Short-Term Memory (LSTM) to extract temporal information. In addition, we present a dataset consisting of Benign and Trojan HTTP Traffic (BTHT-2018). Experimental results show that our model can guarantee high accuracy (the F1 of 98.62% ~ 99.81% and the FPR of 0.34% ~ 0.02% in BTHT-2018). More importantly, our model has a huge advantage over other related methods in generalization ability. HSTF-Model trained with BTHT-2018 can reach the F1 of 93.51% on the public dataset ISCX-2012, which is 20+% better than the best of related machine learning methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0167-4048
1872-6208
DOI:10.1016/j.cose.2020.101923