Computing reward-prediction error: an integrated account of cortical timing and basal-ganglia pathways for appetitive and aversive learning
There are two prevailing notions regarding the involvement of the corticobasal ganglia system in value‐based learning: (i) the direct and indirect pathways of the basal ganglia are crucial for appetitive and aversive learning, respectively, and (ii) the activity of midbrain dopamine neurons represen...
Saved in:
Published in | The European journal of neuroscience Vol. 42; no. 4; pp. 2003 - 2021 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
France
Blackwell Publishing Ltd
01.08.2015
John Wiley and Sons Inc |
Subjects | |
Online Access | Get full text |
ISSN | 0953-816X 1460-9568 1460-9568 |
DOI | 10.1111/ejn.12994 |
Cover
Loading…
Summary: | There are two prevailing notions regarding the involvement of the corticobasal ganglia system in value‐based learning: (i) the direct and indirect pathways of the basal ganglia are crucial for appetitive and aversive learning, respectively, and (ii) the activity of midbrain dopamine neurons represents reward‐prediction error. Although (ii) constitutes a critical assumption of (i), it remains elusive how (ii) holds given (i), with the basal‐ganglia influence on the dopamine neurons. Here we present a computational neural‐circuit model that potentially resolves this issue. Based on the latest analyses of the heterogeneous corticostriatal neurons and connections, our model posits that the direct and indirect pathways, respectively, represent the values of upcoming and previous actions, and up‐regulate and down‐regulate the dopamine neurons via the basal‐ganglia output nuclei. This explains how the difference between the upcoming and previous values, which constitutes the core of reward‐prediction error, is calculated. Simultaneously, it predicts that blockade of the direct/indirect pathway causes a negative/positive shift of reward‐prediction error and thereby impairs learning from positive/negative error, i.e. appetitive/aversive learning. Through simulation of reward‐reversal learning and punishment‐avoidance learning, we show that our model could indeed account for the experimentally observed features that are suggested to support notion (i) and could also provide predictions on neural activity. We also present a behavioral prediction of our model, through simulation of inter‐temporal choice, on how the balance between the two pathways relates to the subject's time preference. These results indicate that our model, incorporating the heterogeneity of the cortical influence on the basal ganglia, is expected to provide a closed‐circuit mechanistic understanding of appetitive/aversive learning.
There are two popular notions in value‐based learning and choice: (i) dopamine represents reward prediction error, and (ii) the basal‐ganglia direct and indirect pathways are crucial for appetitive and aversive learning, respectively. We present an integrated account for these two, whose relationship has remained unclear. We provide predictions on the activity of cortical and striatal neuron subpopulations and on possible effects of the strengths of the two pathways on subject's time preference. |
---|---|
Bibliography: | The Ministry of Education, Science, Sports and Culture of Japan - No. 26120710; No. 25115709 Core Research for Evolutional Science and Technology Japan Society for the Promotion of Science - No. 25250005; No. 25123723; No. 15H01456 ark:/67375/WNG-4VJFGSZZ-P Japan Agency for Medical Research and Development ArticleID:EJN12994 istex:7BB79E9AC2C8602EC9C97EF98AE934873621669E ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0953-816X 1460-9568 1460-9568 |
DOI: | 10.1111/ejn.12994 |