Latent-Maximum-Entropy-Based Cognitive Radar Reward Function Estimation With Nonideal Observations
The concept of “inverse cognition” has recently emerged and has garnered significant research attention in the radar community from aspects of inverse filtering, inverse cognitive radar (I-CR), and designing smart interference for counter-adversarial autonomous systems (i.e., the cognitive radar). F...
Saved in:
Published in | IEEE transactions on aerospace and electronic systems Vol. 60; no. 5; pp. 6656 - 6670 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The concept of “inverse cognition” has recently emerged and has garnered significant research attention in the radar community from aspects of inverse filtering, inverse cognitive radar (I-CR), and designing smart interference for counter-adversarial autonomous systems (i.e., the cognitive radar). For instance, identifying whether an adversary cognitive radar's actions (such as waveform selection and beam scheduling) are consistent with the constrained utility maximization and if so, estimating the utility function has led to recent formulations of I-CR. In this context of I-CR, we address the challenges of estimating unknown and complex utility functions with nonideal action observations. We mean nonideal by missing and nonoptimal action observations. In this article, we assume that the adversary CR is optimizing its action policy by maximizing some forms of the expected utility function with unknown and complex structures over long time horizons. We then designed an IRL method under nonideal observations and illustrated the applicability of the methods. The nonideal factors are treated as latent variables, and the I-CR problem is formulated as a latent information inference problem. Then, an expectation–maximization (EM)-based algorithm is developed to iteratively solve the problem with nonconvex and nonlinear optimizations through a Lagrangian relaxation reformulation. The performance of the proposed method is evaluated and compared utilizing simulated CR target tracking scenarios with Markov decision process (MDP) and partially observable MDP settings. Experimental results verified the robustness, effectiveness, and superiority of the proposed method. |
---|---|
ISSN: | 0018-9251 1557-9603 |
DOI: | 10.1109/TAES.2024.3406671 |