A modern Bayesian look at the multi-armed bandit

A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates obs...

Full description

Saved in:
Bibliographic Details
Published inApplied stochastic models in business and industry Vol. 26; no. 6; pp. 639 - 658
Main Author Scott, Steven L.
Format Journal Article
LanguageEnglish
Published Chichester, UK John Wiley & Sons, Ltd 01.11.2010
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.
AbstractList A multi-armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi-armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are 'optimal' in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature.
A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.
Author Scott, Steven L.
Author_xml – sequence: 1
  givenname: Steven L.
  surname: Scott
  fullname: Scott, Steven L.
  email: stevescott@google.com
  organization: E-mail: stevescott@google.com
BookMark eNp10MtOwzAQBVALFYm2IPEJWbJJsR3HTpZtBeVRYMFL6sZykrEwTeJiu4L-PS1FRUiwmlmcuSPdHuq0tgWEjgkeEIzpqfJNMcgE20NdklIeM0zTztfOYpJjdoB63r9iTAgTpIvwMGpsBa6NRmoF3qg2qq2dRypE4QWiZlkHEyvXQBUVqq1MOET7WtUejr5nHz2enz2ML-Lp3eRyPJzG5fojiyHXXDBa5lVBNKG8YoolIEBQUjFBeZ6yLOFM6yynqsq4ZgVkUAqW8UJnBSR9dLLNXTj7tgQfZGN8CXWtWrBLLwlOCM1yntI1HWxp6az3DrQsTVDB2DY4Zeo1lZtq5KYaua7mJ3t3sHCmUW71F4239N3UsPrXyeH9zeiXNz7Ax84rN5dcJCKVz7cTeYVnlIrrmXxKPgGIsoKd
CitedBy_id crossref_primary_10_1016_j_eswa_2014_04_012
crossref_primary_10_1016_j_ins_2023_119367
crossref_primary_10_1109_TCOMM_2018_2799616
crossref_primary_10_3982_ECTA17527
crossref_primary_10_1080_14697688_2016_1267868
crossref_primary_10_1002_qre_3331
crossref_primary_10_1287_mksc_2021_1304
crossref_primary_10_1002_ieam_4302
crossref_primary_10_1002_sim_8737
crossref_primary_10_3390_e19040181
crossref_primary_10_1016_j_artint_2024_104096
crossref_primary_10_2139_ssrn_2783637
crossref_primary_10_3390_e23030380
crossref_primary_10_1080_10864415_2015_1000216
crossref_primary_10_1080_00401706_2019_1701556
crossref_primary_10_1002_asmb_2355
crossref_primary_10_1016_j_neucom_2021_04_112
crossref_primary_10_1214_17_AOS1569
crossref_primary_10_1016_j_datak_2014_07_003
crossref_primary_10_1287_mksc_2013_0807
crossref_primary_10_2139_ssrn_3294832
crossref_primary_10_1016_j_swevo_2022_101197
crossref_primary_10_1109_TAES_2021_3130832
crossref_primary_10_1287_mksc_2013_0803
crossref_primary_10_1145_3088510
crossref_primary_10_1007_s10614_024_10816_w
crossref_primary_10_1038_ncomms5390
crossref_primary_10_1007_s42519_022_00280_w
crossref_primary_10_1016_j_mathsocsci_2015_05_002
crossref_primary_10_1080_00401706_2022_2125443
crossref_primary_10_1145_3130911
crossref_primary_10_1509_jmr_14_0117
crossref_primary_10_1186_2194_3206_2_2
crossref_primary_10_1016_j_intmar_2012_02_002
crossref_primary_10_2139_ssrn_2481321
crossref_primary_10_1287_moor_2014_0650
crossref_primary_10_1007_s11004_017_9695_9
crossref_primary_10_1002_asmb_2104
crossref_primary_10_1002_wics_1372
crossref_primary_10_1109_TNET_2020_3025904
crossref_primary_10_2139_ssrn_2045842
crossref_primary_10_25046_aj050428
crossref_primary_10_1109_JPROC_2015_2494218
crossref_primary_10_1111_coin_12325
crossref_primary_10_1016_j_jnca_2018_11_006
crossref_primary_10_1109_JIOT_2022_3142185
crossref_primary_10_1146_annurev_statistics_033121_114601
crossref_primary_10_1007_s12369_016_0362_y
crossref_primary_10_1287_mnsc_2015_2153
crossref_primary_10_1177_1094428119854153
crossref_primary_10_1287_mksc_2016_1023
crossref_primary_10_1021_ct500827g
crossref_primary_10_1002_widm_1549
crossref_primary_10_1109_TCSI_2023_3302341
crossref_primary_10_1177_00222429241274727
crossref_primary_10_3389_frai_2021_715690
crossref_primary_10_1103_PhysRevResearch_2_033295
crossref_primary_10_1002_sim_8873
crossref_primary_10_2139_ssrn_3495518
crossref_primary_10_1177_2167702620920722
crossref_primary_10_3389_fnhum_2022_931085
crossref_primary_10_1108_EJM_08_2016_0485
crossref_primary_10_1016_j_ijhcs_2015_01_004
crossref_primary_10_1016_j_jet_2021_105203
crossref_primary_10_1287_mnsc_2022_01205
crossref_primary_10_2139_ssrn_3060341
crossref_primary_10_1016_j_socscimed_2018_04_028
crossref_primary_10_2139_ssrn_2893459
crossref_primary_10_2139_ssrn_2000311
crossref_primary_10_1146_annurev_soc_030420_015345
crossref_primary_10_1177_0272989X17716431
crossref_primary_10_1177_2158244019851675
crossref_primary_10_1007_s10506_015_9165_y
crossref_primary_10_1111_rssc_12266
crossref_primary_10_1146_annurev_economics_080217_053433
crossref_primary_10_1007_s10489_021_02387_2
crossref_primary_10_1162_opmi_a_00132
crossref_primary_10_1257_jel_20191597
crossref_primary_10_2139_ssrn_2368523
crossref_primary_10_2139_ssrn_3070706
crossref_primary_10_1007_s11277_020_08064_w
crossref_primary_10_1007_s10489_012_0346_z
crossref_primary_10_1038_s41598_024_54515_w
crossref_primary_10_1287_mksc_2021_1346
crossref_primary_10_3758_s13428_014_0480_0
crossref_primary_10_1057_s41272_017_0121_1
crossref_primary_10_3390_a15030081
crossref_primary_10_1016_j_buildenv_2020_107013
crossref_primary_10_1109_JIOT_2019_2957964
crossref_primary_10_2139_ssrn_4755027
crossref_primary_10_1016_j_future_2022_02_008
crossref_primary_10_1016_j_eswa_2023_119677
crossref_primary_10_1007_s13253_023_00551_4
crossref_primary_10_2478_zireb_2018_0004
crossref_primary_10_1016_j_cie_2017_11_030
crossref_primary_10_1214_17_BJPS365
crossref_primary_10_3390_s23031402
crossref_primary_10_1186_s13638_020_01738_w
crossref_primary_10_1287_mnsc_2018_3088
crossref_primary_10_1007_s11036_019_01453_x
crossref_primary_10_1109_ACCESS_2020_3012593
crossref_primary_10_1080_01621459_2016_1261711
crossref_primary_10_1002_sta4_247
crossref_primary_10_17706_jsw_10_3_239_249
crossref_primary_10_1093_beheco_arac027
crossref_primary_10_3233_IDT_180036
crossref_primary_10_2139_ssrn_3274875
crossref_primary_10_3982_ECTA17985
crossref_primary_10_1080_00401706_2022_2124309
crossref_primary_10_1109_TCSS_2021_3070239
crossref_primary_10_1057_s41272_017_0096_y
crossref_primary_10_1007_s00158_022_03256_3
crossref_primary_10_1109_TCOMM_2019_2913413
crossref_primary_10_1016_j_seta_2020_100679
crossref_primary_10_1007_s00779_012_0585_3
crossref_primary_10_2139_ssrn_4323382
crossref_primary_10_2139_ssrn_4118863
crossref_primary_10_1016_j_eswa_2016_03_022
crossref_primary_10_1287_opre_2021_2207
crossref_primary_10_2139_ssrn_4077638
crossref_primary_10_1016_j_asoc_2020_107054
crossref_primary_10_1007_s10898_018_0609_2
crossref_primary_10_1098_rsta_2022_0153
crossref_primary_10_1109_TASE_2020_2969884
crossref_primary_10_2139_ssrn_4508118
crossref_primary_10_1109_TITS_2024_3361666
crossref_primary_10_1287_opre_2017_1663
crossref_primary_10_1523_JNEUROSCI_0718_14_2014
crossref_primary_10_1109_TCST_2015_2508007
crossref_primary_10_1287_mksc_2019_1194
crossref_primary_10_1287_mnsc_2020_3599
crossref_primary_10_1080_00401706_2022_2054861
crossref_primary_10_1017_jpr_2023_24
crossref_primary_10_1287_isre_2019_0902
crossref_primary_10_1287_mnsc_2021_4136
crossref_primary_10_2139_ssrn_2711333
crossref_primary_10_1080_00031305_2021_2000495
crossref_primary_10_1287_moor_2021_1229
Cites_doi 10.1111/1468-0262.00170
10.1016/0196-8858(85)90002-8
10.1023/A:1013689704352
10.1214/aop/1176994469
10.1007/978-94-015-3711-7
10.1093/biomet/25.3-4.285
10.2307/2371219
10.1214/aos/1176348788
10.2307/1427934
10.1080/01621459.1993.10476321
10.1007/978-1-4757-3437-9
10.1073/pnas.42.12.920
10.1201/9781420035834
10.1214/aos/1015362186
10.1214/ss/1177009939
10.1111/j.2517-6161.1979.tb01068.x
10.1214/aos/1176350495
10.1214/aos/1176325750
10.1002/9780470182963
10.2307/3214163
10.1016/S0165-1889(01)00028-8
ContentType Journal Article
Copyright Copyright © 2010 John Wiley & Sons, Ltd.
Copyright_xml – notice: Copyright © 2010 John Wiley & Sons, Ltd.
DBID BSCLL
AAYXX
CITATION
7SC
7TA
8FD
JG9
JQ2
L7M
L~C
L~D
DOI 10.1002/asmb.874
DatabaseName Istex
CrossRef
Computer and Information Systems Abstracts
Materials Business File
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Materials Research Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Materials Business File
Computer and Information Systems Abstracts Professional
DatabaseTitleList Materials Research Database

CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
Business
EISSN 1526-4025
EndPage 658
ExternalDocumentID 10_1002_asmb_874
ASMB874
ark_67375_WNG_J0Z227KZ_V
Genre article
GroupedDBID .3N
.GA
.Y3
05W
0R~
10A
1L6
1OC
23M
31~
33P
3SF
3WU
4.4
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52U
52W
52X
5GY
5VS
66C
702
7PT
8-0
8-1
8-3
8-4
8-5
8UM
8VB
930
A03
AAESR
AAEVG
AAHHS
AANLZ
AAONW
AASGY
AAXRX
AAZKR
ABCQN
ABCUV
ABEML
ABIJN
ABJNI
ABPVW
ACAHQ
ACBWZ
ACCFJ
ACCZN
ACGFS
ACIWK
ACPOU
ACSCC
ACXBN
ACXQS
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADOZA
ADXAS
ADZMN
ADZOD
AEEZP
AEIGN
AEIMD
AEMOZ
AENEX
AEQDE
AEUQT
AEUYR
AFBPY
AFFPM
AFGKR
AFPWT
AFZJQ
AHBTC
AITYG
AIURR
AIWBW
AJBDE
AJXKR
AKVCP
ALAGY
ALMA_UNASSIGNED_HOLDINGS
ALUQN
AMBMR
AMYDB
ATUGU
AUFTA
AZBYB
AZFZN
AZVAB
BAFTC
BDRZF
BFHJK
BHBCM
BMNLL
BMXJE
BNHUX
BROTX
BRXPI
BSCLL
BY8
CS3
D-E
D-F
DCZOG
DPXWK
DR2
DRFUL
DRSTM
EBA
EBO
EBR
EBS
EBU
EJD
EMK
EPL
F00
F01
F04
FEDTE
G-S
G.N
GNP
GODZA
H.T
H.X
HF~
HGLYW
HHZ
HVGLF
HZ~
IX1
J0M
JPC
K1G
KQQ
LATKE
LAW
LC2
LC3
LEEKS
LH4
LITHE
LOXES
LP6
LP7
LUTES
LW6
LYRES
MEWTI
MK4
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
N04
N05
N9A
NF~
O66
O9-
OIG
P2P
P2W
P2X
P4D
Q.N
Q11
QB0
QRW
QWB
R.K
ROL
RWI
RX1
RYL
SUPJJ
TH9
UB1
W8V
W99
WBKPD
WIH
WIK
WJL
WOHZO
WQJ
WRC
WXSBR
WYISQ
XBAML
XG1
XPP
XV2
YHZ
ZL0
~IA
~WT
AAHQN
AAMNL
AANHP
AAYCA
ACRPL
ACYXJ
ADNMO
AFWVQ
AHQJS
ALVPJ
AAMMB
AAYXX
AEFGJ
AEYWJ
AGHNM
AGQPQ
AGXDD
AGYGG
AIDQK
AIDYY
AMVHM
CITATION
7SC
7TA
8FD
JG9
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c4024-e9f6742c9db1f126d4a43e7e721d47269548364ff892ad86f4be8ec7486bf8be3
IEDL.DBID DR2
ISSN 1524-1904
1526-4025
IngestDate Fri Jul 11 10:50:21 EDT 2025
Sun Jul 06 05:07:37 EDT 2025
Thu Apr 24 22:54:59 EDT 2025
Wed Jan 22 17:08:37 EST 2025
Wed Oct 30 10:00:59 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
License http://onlinelibrary.wiley.com/termsAndConditions#vor
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c4024-e9f6742c9db1f126d4a43e7e721d47269548364ff892ad86f4be8ec7486bf8be3
Notes istex:CA01319168E0DF39B03376CA2173957D710EFA9E
ArticleID:ASMB874
ark:/67375/WNG-J0Z227KZ-V
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 1031289652
PQPubID 23500
PageCount 20
ParticipantIDs proquest_miscellaneous_1031289652
crossref_citationtrail_10_1002_asmb_874
crossref_primary_10_1002_asmb_874
wiley_primary_10_1002_asmb_874_ASMB874
istex_primary_ark_67375_WNG_J0Z227KZ_V
PublicationCentury 2000
PublicationDate 2010-11
November/December 2010
2010-11-00
20101101
PublicationDateYYYYMMDD 2010-11-01
PublicationDate_xml – month: 11
  year: 2010
  text: 2010-11
PublicationDecade 2010
PublicationPlace Chichester, UK
PublicationPlace_xml – name: Chichester, UK
PublicationTitle Applied stochastic models in business and industry
PublicationTitleAlternate Appl. Stochastic Models Bus. Ind
PublicationYear 2010
Publisher John Wiley & Sons, Ltd
Publisher_xml – name: John Wiley & Sons, Ltd
References Powell WB. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley: New York, 2007.
Gittins J, Wang Y-G. The learning component of dynamic allocation indices. The Annals of Statistics 1992; 20:1625-1636.
West M, Harrison J. Bayesian Forecasting and Dynamic Models. Springer: Berlin, 1997.
Thompson WR. On the theory of apportionment. American Journal of Mathematics 1935; 57(2):450-456.
Brezzi M, Lai TL. Incomplete leraning from endogenous data in dynamic allocation. Econometrica 2000; 68(6):1511-1516.
Whittle P. Restless bandits: activity allocation in a changing world. Journal of Applied Probability 1988; 25A:287-298.
Whittle P. Discussion of 'bandit processes and dynamic allocation indices'. Journal of the Royal Statistical Society, Series B: Methodological 1979; 41:165.
Berry DA, Fristedt B. Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall: London, 1985.
Chaloner K, Verdinelli I. Bayesian experimental design: a review. Statistical Science 1995; 10:273-304.
Doucet A, De Frietas N, Gordon N. Sequential Monte Carlo in Practice. Springer: Berlin, 2001.
Cox DR, Reid N. The Theory of the Design of Experiments. Chapman & Hall, CRC: London, Boca Raton, 2000.
Lai T-L. Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 1987; 15(3):1091-1114.
Yang Y, Zhu D. Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. The Annals of Statistics 2002; 30:100-121.
Bellman RE. A problem in the sequential design of experiments. Sankhya Series A 1956; 30:221-252.
Whittle P. Arm-acquiring bandits. The Annals of Probability 1981; 9(2):284-292.
Agrawal R. Sample mean based index policies with o(logn) regret for the multi-armed bandit problem. Advances in Applied Probability 1995; 27:1054-1078.
Brezzi M, Lai TL. Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 2002; 27:87-108.
Robbins H. A sequential decision procedure with a finite memory. Proceedings of the National Academy of Science 1956; 42:920-923.
Thompson WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 1933; 25:285-294.
Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine Learning 2002; 47:235-256.
Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 1993; 88:669-679.
Gittins JC. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B: Methodological 1979 41:148-177.
Lai T-L, Robbins H. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 1985; 6:4-22.
Tierney L. Markov chains for exploring posterior distributions (disc: P1728-1762). The Annals of Statistics 1994; 22:1701-1728.
Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press: Cambridge, MA, 1998.
Luce D. Individual Choice Behavior. Wiley: New York, 1959.
2002; 30
2010
1956; 30
2000; 68
1993; 88
1995; 10
1985; 6
1998
1994; 22
1997
2007
1981; 9
1935; 57
1959
1933; 25
1987; 15
2002; 27
2002; 47
1988; 25A
2001
2000
1995; 27
1956; 42
1985
1992; 20
1979; 41
e_1_2_8_28_2
e_1_2_8_24_2
e_1_2_8_25_2
e_1_2_8_26_2
e_1_2_8_9_2
Gittins JC (e_1_2_8_4_2) 1979; 41
e_1_2_8_2_2
e_1_2_8_6_2
e_1_2_8_8_2
e_1_2_8_7_2
e_1_2_8_20_2
e_1_2_8_21_2
Whittle P (e_1_2_8_3_2) 1979; 41
e_1_2_8_22_2
Bellman RE (e_1_2_8_5_2) 1956; 30
e_1_2_8_16_2
e_1_2_8_17_2
e_1_2_8_18_2
e_1_2_8_19_2
e_1_2_8_13_2
e_1_2_8_15_2
Luce D (e_1_2_8_23_2) 1959
e_1_2_8_10_2
e_1_2_8_11_2
Sutton RS (e_1_2_8_14_2) 1998
Doucet A (e_1_2_8_12_2) 2001
West M (e_1_2_8_27_2) 1997
References_xml – reference: West M, Harrison J. Bayesian Forecasting and Dynamic Models. Springer: Berlin, 1997.
– reference: Brezzi M, Lai TL. Incomplete leraning from endogenous data in dynamic allocation. Econometrica 2000; 68(6):1511-1516.
– reference: Powell WB. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley: New York, 2007.
– reference: Gittins JC. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B: Methodological 1979 41:148-177.
– reference: Gittins J, Wang Y-G. The learning component of dynamic allocation indices. The Annals of Statistics 1992; 20:1625-1636.
– reference: Luce D. Individual Choice Behavior. Wiley: New York, 1959.
– reference: Yang Y, Zhu D. Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. The Annals of Statistics 2002; 30:100-121.
– reference: Bellman RE. A problem in the sequential design of experiments. Sankhya Series A 1956; 30:221-252.
– reference: Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 1993; 88:669-679.
– reference: Whittle P. Restless bandits: activity allocation in a changing world. Journal of Applied Probability 1988; 25A:287-298.
– reference: Robbins H. A sequential decision procedure with a finite memory. Proceedings of the National Academy of Science 1956; 42:920-923.
– reference: Cox DR, Reid N. The Theory of the Design of Experiments. Chapman & Hall, CRC: London, Boca Raton, 2000.
– reference: Doucet A, De Frietas N, Gordon N. Sequential Monte Carlo in Practice. Springer: Berlin, 2001.
– reference: Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press: Cambridge, MA, 1998.
– reference: Chaloner K, Verdinelli I. Bayesian experimental design: a review. Statistical Science 1995; 10:273-304.
– reference: Whittle P. Arm-acquiring bandits. The Annals of Probability 1981; 9(2):284-292.
– reference: Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine Learning 2002; 47:235-256.
– reference: Brezzi M, Lai TL. Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 2002; 27:87-108.
– reference: Lai T-L, Robbins H. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 1985; 6:4-22.
– reference: Lai T-L. Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 1987; 15(3):1091-1114.
– reference: Agrawal R. Sample mean based index policies with o(logn) regret for the multi-armed bandit problem. Advances in Applied Probability 1995; 27:1054-1078.
– reference: Whittle P. Discussion of 'bandit processes and dynamic allocation indices'. Journal of the Royal Statistical Society, Series B: Methodological 1979; 41:165.
– reference: Thompson WR. On the theory of apportionment. American Journal of Mathematics 1935; 57(2):450-456.
– reference: Berry DA, Fristedt B. Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall: London, 1985.
– reference: Thompson WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 1933; 25:285-294.
– reference: Tierney L. Markov chains for exploring posterior distributions (disc: P1728-1762). The Annals of Statistics 1994; 22:1701-1728.
– volume: 27
  start-page: 87
  year: 2002
  end-page: 108
  article-title: Optimal learning and experimentation in bandit problems
  publication-title: Journal of Economic Dynamics and Control
– volume: 20
  start-page: 1625
  year: 1992
  end-page: 1636
  article-title: The learning component of dynamic allocation indices
  publication-title: The Annals of Statistics
– year: 1985
– volume: 25
  start-page: 285
  year: 1933
  end-page: 294
  article-title: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
  publication-title: Biometrika
– volume: 88
  start-page: 669
  year: 1993
  end-page: 679
  article-title: Bayesian analysis of binary and polychotomous response data
  publication-title: Journal of the American Statistical Association
– volume: 30
  start-page: 221
  year: 1956
  end-page: 252
  article-title: A problem in the sequential design of experiments
  publication-title: Sankhya Series A
– volume: 6
  start-page: 4
  year: 1985
  end-page: 22
  article-title: Asymptotically efficient adaptive allocation rules
  publication-title: Advances in Applied Mathematics
– volume: 57
  start-page: 450
  issue: 2
  year: 1935
  end-page: 456
  article-title: On the theory of apportionment
  publication-title: American Journal of Mathematics
– volume: 30
  start-page: 100
  year: 2002
  end-page: 121
  article-title: Randomized allocation with nonparametric estimation for a multi‐armed bandit problem with covariates
  publication-title: The Annals of Statistics
– year: 2001
– year: 2007
– year: 2000
– volume: 47
  start-page: 235
  year: 2002
  end-page: 256
  article-title: Finite‐time analysis of the multiarmed bandit problem
  publication-title: Machine Learning
– volume: 15
  start-page: 1091
  issue: 3
  year: 1987
  end-page: 1114
  article-title: Adaptive treatment allocation and the multi‐armed bandit problem
  publication-title: The Annals of Statistics
– volume: 22
  start-page: 1701
  year: 1994
  end-page: 1728
  article-title: Markov chains for exploring posterior distributions (disc: P1728–1762)
  publication-title: The Annals of Statistics
– volume: 10
  start-page: 273
  year: 1995
  end-page: 304
  article-title: Bayesian experimental design: a review
  publication-title: Statistical Science
– year: 2010
– year: 1998
– volume: 42
  start-page: 920
  year: 1956
  end-page: 923
  article-title: A sequential decision procedure with a finite memory
  publication-title: Proceedings of the National Academy of Science
– volume: 27
  start-page: 1054
  year: 1995
  end-page: 1078
  article-title: Sample mean based index policies with (log ) regret for the multi‐armed bandit problem
  publication-title: Advances in Applied Probability
– year: 1959
– volume: 41
  start-page: 165
  year: 1979
  article-title: Discussion of ‘bandit processes and dynamic allocation indices’
  publication-title: Journal of the Royal Statistical Society, Series B: Methodological
– volume: 41
  start-page: 148
  year: 1979
  end-page: 177
  article-title: Bandit processes and dynamic allocation indices
  publication-title: Journal of the Royal Statistical Society, Series B: Methodological
– volume: 68
  start-page: 1511
  issue: 6
  year: 2000
  end-page: 1516
  article-title: Incomplete leraning from endogenous data in dynamic allocation
  publication-title: Econometrica
– year: 1997
– volume: 9
  start-page: 284
  issue: 2
  year: 1981
  end-page: 292
  article-title: Arm‐acquiring bandits
  publication-title: The Annals of Probability
– volume: 25A
  start-page: 287
  year: 1988
  end-page: 298
  article-title: Restless bandits: activity allocation in a changing world
  publication-title: Journal of Applied Probability
– ident: e_1_2_8_2_2
– ident: e_1_2_8_6_2
  doi: 10.1111/1468-0262.00170
– ident: e_1_2_8_18_2
  doi: 10.1016/0196-8858(85)90002-8
– ident: e_1_2_8_21_2
  doi: 10.1023/A:1013689704352
– volume-title: Bayesian Forecasting and Dynamic Models
  year: 1997
  ident: e_1_2_8_27_2
– ident: e_1_2_8_28_2
  doi: 10.1214/aop/1176994469
– ident: e_1_2_8_17_2
  doi: 10.1007/978-94-015-3711-7
– volume-title: Reinforcement Learning: An Introduction
  year: 1998
  ident: e_1_2_8_14_2
– ident: e_1_2_8_7_2
  doi: 10.1093/biomet/25.3-4.285
– ident: e_1_2_8_8_2
  doi: 10.2307/2371219
– volume: 30
  start-page: 221
  year: 1956
  ident: e_1_2_8_5_2
  article-title: A problem in the sequential design of experiments
  publication-title: Sankhya Series A
– volume: 41
  start-page: 165
  year: 1979
  ident: e_1_2_8_3_2
  article-title: Discussion of ‘bandit processes and dynamic allocation indices’
  publication-title: Journal of the Royal Statistical Society, Series B: Methodological
– ident: e_1_2_8_16_2
  doi: 10.1214/aos/1176348788
– ident: e_1_2_8_20_2
  doi: 10.2307/1427934
– ident: e_1_2_8_25_2
  doi: 10.1080/01621459.1993.10476321
– volume-title: Sequential Monte Carlo in Practice
  year: 2001
  ident: e_1_2_8_12_2
  doi: 10.1007/978-1-4757-3437-9
– ident: e_1_2_8_22_2
  doi: 10.1073/pnas.42.12.920
– ident: e_1_2_8_9_2
  doi: 10.1201/9781420035834
– ident: e_1_2_8_24_2
  doi: 10.1214/aos/1015362186
– ident: e_1_2_8_10_2
  doi: 10.1214/ss/1177009939
– volume: 41
  start-page: 148
  year: 1979
  ident: e_1_2_8_4_2
  article-title: Bandit processes and dynamic allocation indices
  publication-title: Journal of the Royal Statistical Society, Series B: Methodological
  doi: 10.1111/j.2517-6161.1979.tb01068.x
– ident: e_1_2_8_19_2
  doi: 10.1214/aos/1176350495
– ident: e_1_2_8_11_2
  doi: 10.1214/aos/1176325750
– ident: e_1_2_8_13_2
  doi: 10.1002/9780470182963
– volume-title: Individual Choice Behavior
  year: 1959
  ident: e_1_2_8_23_2
– ident: e_1_2_8_26_2
  doi: 10.2307/3214163
– ident: e_1_2_8_15_2
  doi: 10.1016/S0165-1889(01)00028-8
SSID ssj0011471
Score 2.3873754
Snippet A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned...
A multi-armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned...
SourceID proquest
crossref
wiley
istex
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 639
SubjectTerms Bayesian adaptive design
Bayesian analysis
Business
exploration vs exploitation
Flexibility
Heuristic
Matching
Mathematical models
Optimization
probability matching
Reinforcement
sequential design
Title A modern Bayesian look at the multi-armed bandit
URI https://api.istex.fr/ark:/67375/WNG-J0Z227KZ-V/fulltext.pdf
https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fasmb.874
https://www.proquest.com/docview/1031289652
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NTtwwELYquLSHFlpQ-Slypao9Zck6juMcd6suiGo5tKVFcLDGjnNZWKrdrASceASekSfpjLMJPwIJ9ZTLxElm7Mw39sw3jH2i7uUaZIwWUPUxY2S7SkUahEtc4jUkVO883Fe7B3LvMD2cZ1VSLUzND9FuuNHKCP9rWuBgp9u3pKEwPbUdnREVKKVqER760TJHIcqvY61UyAh9nmx4Z2Ox3dx4zxMtklLP78HMu2A1eJvBG3bcvGedZDLqzCrbcZcPKBz_70OW2Os5COW9etYssxd-_Ja9GrYMrtN3TPT4aWiTxvtw4anSkp8gHudQcZTiIQ_x5uoaJuhNuaXamGqFHQy-_fq6G837K0QOo0YZ-bxUGBm7vLDdsitUIUEmPvMYFBYyE4q44BIly1LnAgqtSmm99i6TWtlSW5-ssoXx2di_ZzzNrI5TiAHAySIvQfiusy7HQXwWx-Ua-9Lo2rg5-Tj1wDgxNW2yMKQFg1pYYx9byb814cYjMp-DuVoBmIwoQS1LzZ_9HbMXHwmRfT8yv3Gwxp4Glw2dhcDYn82mhrpbYKypUoGDBes8-TTT-zns43X9uYIb7GXIMQgVi5tsoZrM_AeELpXdCpP0Hxjv6vg
linkProvider Wiley-Blackwell
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NTtwwEB5ROLQcSoEioNC6EiqnLFnHcRxxWlDp8rN7KNCiFZJlO86F3QXtZqWWE4_AM_IkjJ1NWlArVT3lMpkkM57MjD3zDcCWm14uFAtRA7w8Zgx0k_NAKGoiE1mhItfv3Ony9jk7uogvZmC36oUp8SHqDTdnGf5_7QzcbUjv_EINVeOBboiEvYA5N9Db51Nfa-wojPPLbCumLECvxyrk2ZDuVHc-8UVzTqw_ngSav4er3t8cLMBl9aZlmclVY1Lohrl9BuL4n5_yBl5P41DSKhfOIszY4RLMd2oQ1_Ey0BYZ-ElpZE_9tK7ZkvQxJCeqIEhFfCniw929GqFDJdq1xxRv4fzg89l-O5iOWAgMJo4ssGnOMTk2aaabeZPyjCkW2cRiXpixhHIHBxdxlucipSoTPGfaCmsSJrjOhbbRCswOr4d2FUicaBHGKlRKGZaluaK2abRJkYlNwjBfg-1K2NJM8cfdGIy-LJGTqXRSkCiFNfhYU96UmBt_oPnk9VUTqNGVq1FLYvm9-0UehT1Kk-Oe_IbMKoVKtBx3HKKG9noylm7ABaabPKbIzKvnr0-TrdPOHl7X_5XwA7xsn3VO5Mlh9_gdvPIlB76BcQNmi9HEbmIkU-j3fsU-AsHc7xM
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwELYoSKgceJWKN0ZC9JQl6ziOc1wey3NXCEqL6MGyHfsCLGg3KwEnfgK_kV_C2NkEqFoJccplMklmxpkZe-YbhNbc9HIuaQgaYMUxY6DqjAVcEh3pyHAZuX7nVpvtndGD8_h8UFXpemEKfIhqw82tDP-_dgv8NrMbr6ChsnetajyhX9AIZSF3Fr19UkFHQZhfJFsxoQE4PVoCz4Zko7zznSsacVK9exdnvo1WvbtpTqA_5YsWVSaXtX6uavrhLwzHz33JJBofRKG4UZjNFBoynWk01qogXHvfEGngaz8nDW_Ke-NaLfEVBORY5hiosC9EfH58kl1wp1i55ph8Bp01d35u7QWDAQuBhrSRBia1DFJjnWaqbuuEZVTSyCQGssKMJoQ5MLiIUWt5SmTGmaXKcKMTypmyXJnoOxru3HTMLMJxongYy1BKqWmWWklMXSudAhOThKGdQz9KWQs9QB93QzCuRIGbTISTggApzKHVivK2QNz4B826V1dFILuXrkIticXv9q44CC8ISQ4vxC9gVupTwLpxhyGyY276PeHGW0CyyWICzLx2_vs00ThtbcJ1_qOEK2j0eLspjvbbhwvoq6838N2Li2g47_bNEoQxuVr29voCJWntyw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+modern+Bayesian+look+at+the+multi%E2%80%90armed+bandit&rft.jtitle=Applied+stochastic+models+in+business+and+industry&rft.au=Scott%2C+Steven+L.&rft.date=2010-11-01&rft.pub=John+Wiley+%26+Sons%2C+Ltd&rft.issn=1524-1904&rft.eissn=1526-4025&rft.volume=26&rft.issue=6&rft.spage=639&rft.epage=658&rft_id=info:doi/10.1002%2Fasmb.874&rft.externalDBID=10.1002%252Fasmb.874&rft.externalDocID=ASMB874
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1524-1904&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1524-1904&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1524-1904&client=summon