Uncertainty-guided learning with scaled prediction errors in the basal ganglia

To accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should b...

Full description

Saved in:

Bibliographic Details
Published in	PLoS computational biology Vol. 18; no. 5; p. e1009816
Main Authors	Möller, Moritz, Manohar, Sanjay, Bogacz, Rafal
Format	Journal Article
Language	English
Published	United States Public Library of Science 01.05.2022 Public Library of Science (PLoS)
Subjects	Algorithms Basal ganglia Basal Ganglia - physiology Biology and Life Sciences Circuits Dopamine Dopamine - physiology Dopamine receptors Error signals Ganglia Kalman filters Learning Learning - physiology Mean Medicine and Health Sciences Neostriatum Neurons Physical Sciences Physiological aspects Predictions Psychological research Reinforcement Reinforcement, Psychology Research and Analysis Methods Reward Reward (Psychology) Social Sciences Standard deviation Tracking Uncertainty United Kingdom
Online Access	Get full text

Cover

Loading…

More Information
Summary:	To accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should be updated to a smaller extent after each observation. However, it is not known how the magnitude of the observation noise might be tracked and used to control prediction updates in the brain reward system. Here, we introduce a new model that uses simple, tractable learning rules that track the mean and standard deviation of reward, and leverages prediction errors scaled by uncertainty as the central feedback signal. We show that the new model has an advantage over conventional reinforcement learning models in a value tracking task, and approaches a theoretic limit of performance provided by the Kalman filter. Further, we propose a possible biological implementation of the model in the basal ganglia circuit. In the proposed network, dopaminergic neurons encode reward prediction errors scaled by standard deviation of rewards. We show that such scaling may arise if the striatal neurons learn the standard deviation of rewards and modulate the activity of dopaminergic neurons. The model is consistent with experimental findings concerning dopamine prediction error scaling relative to reward magnitude, and with many features of striatal plasticity. Our results span across the levels of implementation, algorithm, and computation, and might have important implications for understanding the dopaminergic prediction error signal and its relation to adaptive and effective learning.
Bibliography:	new_version ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 The authors have declared that no competing interests exist.
ISSN:	1553-7358 1553-734X 1553-7358
DOI:	10.1371/journal.pcbi.1009816