Semi-Supervised Encrypted Traffic Classification With Deep Convolutional Generative Adversarial Networks

Network traffic classification serves as a building block for important tasks such as security and quality of service management. The field has been studied for a long time, with many techniques such as classical machine learning and deep learning methods currently available. However, the emergence...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 8; pp. 118 - 126
Main Authors	Iliyasu, Auwal Sani, Deng, Huifang
Format	Journal Article
Language	English
Published	Piscataway IEEE 2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Classification Classifiers Communications traffic Datasets Deep convolutional generative adversarial network Deep learning encrypted traffic classification Encryption Generative adversarial networks Generators Inspection Labeling Machine learning Payloads Protocol (computers) Protocols semi-supervised learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Network traffic classification serves as a building block for important tasks such as security and quality of service management. The field has been studied for a long time, with many techniques such as classical machine learning and deep learning methods currently available. However, the emergence of stronger encryption protocols has led to the rise of new challenges. One of the challenges is capturing and labeling a large amount of encrypted traffic data especially for training deep learning classifiers, as current techniques rely on deep packet inspection tools (DPI) which perform poorly on encrypted traffic. In this paper, we propose a semi-supervised learning approach using Deep Convolutional Generative Adversarial Network (DCGAN). The basic idea is to utilize the samples generated by DCGAN generators as well as unlabeled data to improve the performance of a classifier trained on a few labeled samples. Thus, alleviating the difficulties associated with large dataset collecting and labeling. To demonstrate the efficacy of our approach, we evaluated our model using a self-collected dataset of the recently established QUIC protocol as well as publicly available ISCX VPN-NonVPN dataset. Our approach is able to achieve 89% and 78% accuracy with a very small number of labeled samples (just 10% of the dataset) on both QUIC and ISCX VPN-NonVPN datasets respectively.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2019.2962106