Neural correlates of forward planning in a spatial decision task in humans

Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigat...

Full description

Saved in:

Bibliographic Details
Published in	The Journal of neuroscience Vol. 31; no. 14; pp. 5526 - 5539
Main Authors	Simon, Dylan Alexander, Daw, Nathaniel D
Format	Journal Article
Language	English
Published	United States Society for Neuroscience 06.04.2011
Subjects	Adolescent Adult Algorithms Brain - blood supply Brain - physiology Brain Mapping Cues Decision Making - physiology Female Humans Image Processing, Computer-Assisted - methods Individuality Likelihood Functions Linear Models Magnetic Resonance Imaging - methods Male Models, Neurological Neuropsychological Tests Oxygen - blood Probability Reinforcement (Psychology) Space Perception - physiology Statistics as Topic Young Adult
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0270-6474 1529-2401
DOI:	10.1523/jneurosci.4647-10.2011