Causal Reinforcement Learning in Iterated Prisoner's Dilemma
The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents pl...
Saved in:
Published in | IEEE transactions on computational social systems Vol. 11; no. 2; pp. 2523 - 2534 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.04.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents playing strategies from the Axelrod tournament. The collection of policies is assembled in an ensemble RL to choose the best strategy. The agent is then tested against selected Axelrod tournament strategies as well as an adaptive agent trained using traditional RL. Results show that our agent is able to play against all other players and score higher while being adaptive in situations where the strategy of the other players' changes. Furthermore, the decision taken by the agent can be explained in terms of the causal representation of the interactions. Based on the decision made by the agent, a human observer can understand the chosen strategy. |
---|---|
ISSN: | 2329-924X 2329-924X 2373-7476 |
DOI: | 10.1109/TCSS.2023.3289470 |