Human subjects exploit a cognitive map for credit assignment

An influential reinforcement learning framework proposes that behavior is jointly governed by model-free (MF) and model-based (MB) controllers. The former learns the values of actions directly from past encounters, and the latter exploits a cognitive map of the task to calculate these prospectively....

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 118; no. 4; pp. 1 - 12
Main Authors	Moran, Rani, Dayan, Peter, Dolan, Raymond J.
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 26.01.2021
Subjects	Biological Sciences Social Sciences model-free cognitive maps decision making model-based reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	An influential reinforcement learning framework proposes that behavior is jointly governed by model-free (MF) and model-based (MB) controllers. The former learns the values of actions directly from past encounters, and the latter exploits a cognitive map of the task to calculate these prospectively. Considerable attention has been paid to how these systems interact during choice, but how and whether knowledge of a cognitive map contributes to the way MF and MB controllers assign credit (i.e., to how they revaluate actions and states following the receipt of an outcome) remains underexplored. Here, we examine such sophisticated credit assignment using a dual-outcome bandit task. We provide evidence that knowledge of a cognitive map influences credit assignment in both MF and MB systems, mediating subtly different aspects of apparent relevance. Specifically, we show MF credit assignment is enhanced for those rewards that are related to a choice, and this contrasted with choice-unrelated rewards that reinforced subsequent choices negatively. This modulation is only possible based on knowledge of task structure. On the other hand, MB credit assignment was boosted for outcomes that impacted on differences in values between offered bandits. We consider mechanistic accounts and the normative status of these findings. We suggest the findings extend the scope and sophistication of cognitive map-based credit assignment during reinforcement learning, with implications for understanding behavioral control.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by Fiery Cushman, Harvard University, Cambridge, MA, and accepted by Editorial Board Member Michael S. Gazzaniga December 16, 2020 (received for review August 11, 2020) Author contributions: R.M. designed research; R.M. performed research; R.M. contributed new reagents/analytic tools; R.M. analyzed data; R.M. and P.D. interpreted the data; and R.M., P.D., and R.J.D. wrote the paper. 2P.D. and R.J.D. contributed equally to this work.
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.2016884118