Latent-Maximum-Entropy-Based Cognitive Radar Reward Function Estimation With Nonideal Observations

The concept of “inverse cognition” has recently emerged and has garnered significant research attention in the radar community from aspects of inverse filtering, inverse cognitive radar (I-CR), and designing smart interference for counter-adversarial autonomous systems (i.e., the cognitive radar). F...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on aerospace and electronic systems Vol. 60; no. 5; pp. 6656 - 6670
Main Authors	Zhang, Luyao, Zhu, Mengtao, Qin, Jiahao, Li, Yunjie
Format	Journal Article
Language	English
Published	IEEE 01.10.2024
Subjects	Cognition Cognitive radar Cognitive radar (CR) Entropy expectation–maximization (EM) Interference inverse cognition inverse reinforcement learning latent maximum entropy (LME) Optimization Stochastic processes Trajectory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The concept of “inverse cognition” has recently emerged and has garnered significant research attention in the radar community from aspects of inverse filtering, inverse cognitive radar (I-CR), and designing smart interference for counter-adversarial autonomous systems (i.e., the cognitive radar). For instance, identifying whether an adversary cognitive radar's actions (such as waveform selection and beam scheduling) are consistent with the constrained utility maximization and if so, estimating the utility function has led to recent formulations of I-CR. In this context of I-CR, we address the challenges of estimating unknown and complex utility functions with nonideal action observations. We mean nonideal by missing and nonoptimal action observations. In this article, we assume that the adversary CR is optimizing its action policy by maximizing some forms of the expected utility function with unknown and complex structures over long time horizons. We then designed an IRL method under nonideal observations and illustrated the applicability of the methods. The nonideal factors are treated as latent variables, and the I-CR problem is formulated as a latent information inference problem. Then, an expectation–maximization (EM)-based algorithm is developed to iteratively solve the problem with nonconvex and nonlinear optimizations through a Lagrangian relaxation reformulation. The performance of the proposed method is evaluated and compared utilizing simulated CR target tracking scenarios with Markov decision process (MDP) and partially observable MDP settings. Experimental results verified the robustness, effectiveness, and superiority of the proposed method.
ISSN:	0018-9251 1557-9603
DOI:	10.1109/TAES.2024.3406671