Memory‐based deep reinforcement learning for cognitive radar target tracking waveform resource management

A cognitive radar (CR) system can offer enhanced target tracking performance due to its intelligence on the perception‐action cycle, wherein a CR adaptively allocates the limited transmitting resources based on its perception of surrounding environments. To effectively manage the transmit waveform r...

Full description

Saved in:

Bibliographic Details
Published in	IET radar, sonar & navigation Vol. 17; no. 12; pp. 1822 - 1836
Main Authors	Qin, Jiahao, Zhu, Mengtao, Pan, Zesi, Li, Yunjie, Li, Yan
Format	Journal Article
Language	English
Published	Wiley 01.12.2023
Subjects	adaptive radar decision making intelligent networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A cognitive radar (CR) system can offer enhanced target tracking performance due to its intelligence on the perception‐action cycle, wherein a CR adaptively allocates the limited transmitting resources based on its perception of surrounding environments. To effectively manage the transmit waveform resource for the target tracking task, CR resource management problem is formulated under the partially observable Markov decision process framework. The sequential decision‐making and the inherent partial observability for target tracking problem are considered. In the proposed method, a long short‐term memory (LSTM)‐based twin delayed deep deterministic policy gradient (TD3) algorithm is developed to effectively solve the problem. A reward function is designed considering Haykin's cognitive executive attention mechanism for radar systems such that the CR resource management policy has stability in the decision of transmit waveform, which follows the principle of minimum disturbance. Simulation results demonstrate the superiority of the proposed LSTM memory‐based TD3 with improved target tracking performance and increased mean rewards for CR. This paper formulates a cognitive radar resource management problem under the partially observable Markov decision process (POMDP) framework. And a long short‐term memory (LSTM) based twin delayed deep deterministic policy gradient (TD3) algorithm is developed to effectively solve the problem.
ISSN:	1751-8784 1751-8792
DOI:	10.1049/rsn2.12469