Latent-Maximum-Entropy-Based Cognitive Radar Reward Function Estimation With Nonideal Observations

The concept of “inverse cognition” has recently emerged and has garnered significant research attention in the radar community from aspects of inverse filtering, inverse cognitive radar (I-CR), and designing smart interference for counter-adversarial autonomous systems (i.e., the cognitive radar). F...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on aerospace and electronic systems Vol. 60; no. 5; pp. 6656 - 6670
Main Authors Zhang, Luyao, Zhu, Mengtao, Qin, Jiahao, Li, Yunjie
Format Journal Article
LanguageEnglish
Published IEEE 01.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The concept of “inverse cognition” has recently emerged and has garnered significant research attention in the radar community from aspects of inverse filtering, inverse cognitive radar (I-CR), and designing smart interference for counter-adversarial autonomous systems (i.e., the cognitive radar). For instance, identifying whether an adversary cognitive radar's actions (such as waveform selection and beam scheduling) are consistent with the constrained utility maximization and if so, estimating the utility function has led to recent formulations of I-CR. In this context of I-CR, we address the challenges of estimating unknown and complex utility functions with nonideal action observations. We mean nonideal by missing and nonoptimal action observations. In this article, we assume that the adversary CR is optimizing its action policy by maximizing some forms of the expected utility function with unknown and complex structures over long time horizons. We then designed an IRL method under nonideal observations and illustrated the applicability of the methods. The nonideal factors are treated as latent variables, and the I-CR problem is formulated as a latent information inference problem. Then, an expectation–maximization (EM)-based algorithm is developed to iteratively solve the problem with nonconvex and nonlinear optimizations through a Lagrangian relaxation reformulation. The performance of the proposed method is evaluated and compared utilizing simulated CR target tracking scenarios with Markov decision process (MDP) and partially observable MDP settings. Experimental results verified the robustness, effectiveness, and superiority of the proposed method.
ISSN:0018-9251
1557-9603
DOI:10.1109/TAES.2024.3406671