Causal Reinforcement Learning in Iterated Prisoner's Dilemma

The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents pl...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computational social systems Vol. 11; no. 2; pp. 2523 - 2534
Main Authors Kazemi, Yosra, Chanel, Caroline P. C., Givigi, Sidney
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.04.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents playing strategies from the Axelrod tournament. The collection of policies is assembled in an ensemble RL to choose the best strategy. The agent is then tested against selected Axelrod tournament strategies as well as an adaptive agent trained using traditional RL. Results show that our agent is able to play against all other players and score higher while being adaptive in situations where the strategy of the other players' changes. Furthermore, the decision taken by the agent can be explained in terms of the causal representation of the interactions. Based on the decision made by the agent, a human observer can understand the chosen strategy.
ISSN:2329-924X
2329-924X
2373-7476
DOI:10.1109/TCSS.2023.3289470