Resilient Dynamic Channel Access via Robust Deep Reinforcement Learning

As the applications of deep reinforcement learning (DRL) in wireless communications grow, sensitivity of DRL-based wireless communication strategies against adversarial attacks has started to draw increasing attention. In order to address such sensitivity and alleviate the resulting security concern...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 9; pp. 163188 - 163203
Main Authors	Wang, Feng, Zhong, Chen, Gursoy, M. Cenk, Velipasalar, Senem
Format	Journal Article
Language	English
Published	Piscataway IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers
Subjects	Adaptation models Adversarial policies Decision making Deep learning deep reinforcement learning Defense defense strategies dynamic channel access Heuristic algorithms Jamming jamming attacks Markov processes Policies Proportional integral derivative Reinforcement learning Sensitivity Switches Wireless communication Wireless communications
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As the applications of deep reinforcement learning (DRL) in wireless communications grow, sensitivity of DRL-based wireless communication strategies against adversarial attacks has started to draw increasing attention. In order to address such sensitivity and alleviate the resulting security concerns, we in this paper consider a victim user that performs DRL-based dynamic channel access, and an attacker that executes DRL-based jamming attacks to disrupt the victim. Hence, both the victim and attacker are DRL agents and can interact with each other, retrain their models, and adapt to opponents' policies. In this setting, we initially develop an adversarial jamming attack policy that aims at minimizing the accuracy of victim's decision making on dynamic channel access. Subsequently, we devise defense strategies against such an attacker, and propose three defense strategies, namely diversified defense with proportional-integral-derivative (PID) control, diversified defense with an imitation attacker, and defense via orthogonal policies. We design these strategies to maximize the attacked victim's accuracy and evaluate their performances.
Bibliography:	USDOE AR0000940
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3133506