CODEX: A Cluster-Based Method for Explainable Reinforcement Learning
Despite the impressive feats demonstrated by Reinforcement Learning (RL), these algorithms have seen little adoption in high-risk, real-world applications due to current difficulties in explaining RL agent actions and building user trust. We present Counterfactual Demonstrations for Explanation (COD...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
07.12.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Despite the impressive feats demonstrated by Reinforcement Learning (RL),
these algorithms have seen little adoption in high-risk, real-world
applications due to current difficulties in explaining RL agent actions and
building user trust. We present Counterfactual Demonstrations for Explanation
(CODEX), a method that incorporates semantic clustering, which can effectively
summarize RL agent behavior in the state-action space. Experimentation on the
MiniGrid and StarCraft II gaming environments reveals the semantic clusters
retain temporal as well as entity information, which is reflected in the
constructed summary of agent behavior. Furthermore, clustering the
discrete+continuous game-state latent representations identifies the most
crucial episodic events, demonstrating a relationship between the latent and
semantic spaces. This work contributes to the growing body of work that strives
to unlock the power of RL for widespread use by leveraging and extending
techniques from Natural Language Processing. |
---|---|
DOI: | 10.48550/arxiv.2312.04216 |