Saccade selection when reward probability is dynamically manipulated using Markov chains

Markov chains (stochastic processes where probabilities are assigned based on the previous outcome) are commonly used to examine the transitions between behavioral states, such as those that occur during foraging or social interactions. However, relatively little is known about how well primates can...

Full description

Saved in:
Bibliographic Details
Published inExperimental brain research Vol. 187; no. 2; pp. 321 - 330
Main Authors Nummela, Samuel U., Lovejoy, Lee P., Krauzlis, Richard J.
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer-Verlag 01.05.2008
Springer
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Markov chains (stochastic processes where probabilities are assigned based on the previous outcome) are commonly used to examine the transitions between behavioral states, such as those that occur during foraging or social interactions. However, relatively little is known about how well primates can incorporate knowledge about Markov chains into their behavior. Saccadic eye movements are an example of a simple behavior influenced by information about probability, and thus are good candidates for testing whether subjects can learn Markov chains. In addition, when investigating the influence of probability on saccade target selection, the use of Markov chains could provide an alternative method that avoids confounds present in other task designs. To investigate these possibilities, we evaluated human behavior on a task in which stimulus reward probabilities were assigned using a Markov chain. On each trial, the subject selected one of four identical stimuli by saccade; after selection, feedback indicated the rewarded stimulus. Each session consisted of 200–600 trials, and on some sessions, the reward magnitude varied. On sessions with a uniform reward, subjects ( n  = 6) learned to select stimuli at a frequency close to reward probability, which is similar to human behavior on matching or probability classification tasks. When informed that a Markov chain assigned reward probabilities, subjects ( n  = 3) learned to select the greatest reward probability more often, bringing them close to behavior that maximizes reward. On sessions where reward magnitude varied across stimuli, subjects ( n  = 6) demonstrated preferences for both greater reward probability and greater reward magnitude, resulting in a preference for greater expected value (the product of reward probability and magnitude). These results demonstrate that Markov chains can be used to dynamically assign probabilities that are rapidly exploited by human subjects during saccade target selection.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:0014-4819
1432-1106
DOI:10.1007/s00221-008-1306-z