MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT

Generally, the multi-armed has been studied under the setting that at each time step over an infinite horizon a controller chooses to activate a single process or bandit out of a finite collection of independent processes (statistical experiments, populations, etc.) for a single period, receiving a...

Full description

Saved in:

Bibliographic Details
Published in	Probability in the engineering and informational sciences Vol. 29; no. 1; pp. 51 - 76
Main Authors	Cowan, Wesley, Katehakis, Michael N.
Format	Journal Article
Language	English
Published	New York, USA Cambridge University Press 01.01.2015
Subjects	Activation Collection Constants Decisions Depreciation Discounts Mathematical analysis Mathematical models Probability Receiving Stochasticity
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Generally, the multi-armed has been studied under the setting that at each time step over an infinite horizon a controller chooses to activate a single process or bandit out of a finite collection of independent processes (statistical experiments, populations, etc.) for a single period, receiving a reward that is a function of the activated process, and in doing so advancing the chosen process. Classically, rewards are discounted by a constant factor β∈(0, 1) per round. In this paper, we present a solution to the problem, with potentially non-Markovian, uncountable state space reward processes, under a framework in which, first, the discount factors may be non-uniform and vary over time, and second, the periods of activation of each bandit may be not be fixed or uniform, subject instead to a possibly stochastic duration of activation before a change to a different bandit is allowed. The solution is based on generalized restart-in-state indices, and it utilizes a view of the problem not as “decisions over state space” but rather “decisions over time”.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0269-9648 1469-8951
DOI:	10.1017/S0269964814000217