Robust Imitation via Mirror Descent Inverse Reinforcement Learning
Recently, adversarial imitation learning has shown a scalable reward acquisition method for inverse reinforcement learning (IRL) problems. However, estimated reward signals often become uncertain and fail to train a reliable statistical model since the existing methods tend to solve hard optimizatio...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
20.10.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recently, adversarial imitation learning has shown a scalable reward
acquisition method for inverse reinforcement learning (IRL) problems. However,
estimated reward signals often become uncertain and fail to train a reliable
statistical model since the existing methods tend to solve hard optimization
problems directly. Inspired by a first-order optimization method called mirror
descent, this paper proposes to predict a sequence of reward functions, which
are iterative solutions for a constrained convex problem. IRL solutions derived
by mirror descent are tolerant to the uncertainty incurred by target density
estimation since the amount of reward learning is regulated with respect to
local geometric constraints. We prove that the proposed mirror descent update
rule ensures robust minimization of a Bregman divergence in terms of a rigorous
regret bound of $\mathcal{O}(1/T)$ for step sizes $\{\eta_t\}_{t=1}^{T}$. Our
IRL method was applied on top of an adversarial framework, and it outperformed
existing adversarial methods in an extensive suite of benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2210.11201 |