Multi-task reinforcement learning in humans

The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multitask reinforcement learning. We study participants’ behaviour in a two-step decision-making task with multiple features and changing rewa...

Full description

Saved in:

Bibliographic Details
Published in	Nature human behaviour Vol. 5; no. 6; pp. 764 - 773
Main Authors	Tomov, Momchil S., Schulz, Eric, Gershman, Samuel J.
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 01.06.2021 Nature Publishing Group
Subjects	631/378 631/477 631/477/2811 Algorithms Behavior Behavioral Sciences Biomedical and Life Sciences Decision making Experimental Psychology Humans Intelligence Learning Life Sciences Microeconomics Neurosciences Personality and Social Psychology Reinforcement
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multitask reinforcement learning. We study participants’ behaviour in a two-step decision-making task with multiple features and changing reward functions. We compare their behaviour with two algorithms for multitask reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered confirmatory experiment, our results provide evidence that participants who are able to learn the task use a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands. Studying behaviour in a decision-making task with multiple features and changing reward functions, Tomov et al. find that a strategy that combines successor features with generalized policy iteration predicts behaviour best.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2397-3374 2397-3374
DOI:	10.1038/s41562-020-01035-y