Jamming Strategy Optimization through Dual Q-Learning Model against Adaptive Radar

Modern adaptive radars can switch work modes to perform various missions and simultaneously use pulse parameter agility in each mode to improve survivability, which leads to a multiplicative increase in the decision-making complexity and declining performance of the existing jamming methods. In this...

Full description

Saved in:

Bibliographic Details
Published in	Sensors (Basel, Switzerland) Vol. 22; no. 1; p. 145
Main Authors	Liu, Hongdi, Zhang, Hongtao, He, Yuan, Sun, Yong
Format	Journal Article
Language	English
Published	Switzerland MDPI AG 26.12.2021 MDPI
Subjects	adaptive radar Algorithms Decision making Distance measurement Feedback Jamming jamming effectiveness evaluation jamming strategy optimizing Machine learning Markov processes Optimization Parameters Q-learning Radar Radio signals reinforcement learning Self defense Subspaces Survivability adaptive radar Q-learning jamming effectiveness evaluation jamming strategy optimizing reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Modern adaptive radars can switch work modes to perform various missions and simultaneously use pulse parameter agility in each mode to improve survivability, which leads to a multiplicative increase in the decision-making complexity and declining performance of the existing jamming methods. In this paper, a two-level jamming decision-making framework is developed, based on which a dual Q-learning (DQL) model is proposed to optimize the jamming strategy and a dynamic method for jamming effectiveness evaluation is designed to update the model. Specifically, the jamming procedure is modeled as a finite Markov decision process. On this basis, the high-dimensional jamming action space is disassembled into two low-dimensional subspaces containing jamming mode and pulse parameters respectively, then two specialized Q-learning models with interaction are built to obtain the optimal solution. Moreover, the jamming effectiveness is evaluated through indicator vector distance measuring to acquire the feedback for the DQL model, where indicators are dynamically weighted to adapt to the environment. The experiments demonstrate the advantage of the proposed method in learning radar joint strategy of mode switching and parameter agility, shown as improving the average jamming-to-signal radio (JSR) by 4.05% while reducing the convergence time by 34.94% compared with the normal Q-learning method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s22010145