A modern Bayesian look at the multi-armed bandit

A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates obs...

Full description

Saved in:

Bibliographic Details
Published in	Applied stochastic models in business and industry Vol. 26; no. 6; pp. 639 - 658
Main Author	Scott, Steven L.
Format	Journal Article
Language	English
Published	Chichester, UK John Wiley & Sons, Ltd 01.11.2010
Subjects	Bayesian adaptive design Bayesian analysis Business exploration vs exploitation Flexibility Heuristic Matching Mathematical models Optimization probability matching Reinforcement sequential design
Online Access	Get full text

Cover

Loading…

Abstract	A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.
AbstractList	A multi-armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi-armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are 'optimal' in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned sequentially. This article describes a heuristic for managing multi‐armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform methods that are ‘optimal’ in simpler contexts. I summarize the relationships between randomized probability matching and several related heuristics that have been used in the reinforcement learning literature. Copyright © 2010 John Wiley & Sons, Ltd.
Author	Scott, Steven L.
Author_xml	– sequence: 1 givenname: Steven L. surname: Scott fullname: Scott, Steven L. email: stevescott@google.com organization: E-mail: stevescott@google.com
BookMark	eNp10MtOwzAQBVALFYm2IPEJWbJJsR3HTpZtBeVRYMFL6sZykrEwTeJiu4L-PS1FRUiwmlmcuSPdHuq0tgWEjgkeEIzpqfJNMcgE20NdklIeM0zTztfOYpJjdoB63r9iTAgTpIvwMGpsBa6NRmoF3qg2qq2dRypE4QWiZlkHEyvXQBUVqq1MOET7WtUejr5nHz2enz2ML-Lp3eRyPJzG5fojiyHXXDBa5lVBNKG8YoolIEBQUjFBeZ6yLOFM6yynqsq4ZgVkUAqW8UJnBSR9dLLNXTj7tgQfZGN8CXWtWrBLLwlOCM1yntI1HWxp6az3DrQsTVDB2DY4Zeo1lZtq5KYaua7mJ3t3sHCmUW71F4239N3UsPrXyeH9zeiXNz7Ax84rN5dcJCKVz7cTeYVnlIrrmXxKPgGIsoKd
CitedBy_id	crossref_primary_10_1016_j_eswa_2014_04_012 crossref_primary_10_1016_j_ins_2023_119367 crossref_primary_10_1109_TCOMM_2018_2799616 crossref_primary_10_3982_ECTA17527 crossref_primary_10_1080_14697688_2016_1267868 crossref_primary_10_1002_qre_3331 crossref_primary_10_1287_mksc_2021_1304 crossref_primary_10_1002_ieam_4302 crossref_primary_10_1002_sim_8737 crossref_primary_10_3390_e19040181 crossref_primary_10_1016_j_artint_2024_104096 crossref_primary_10_2139_ssrn_2783637 crossref_primary_10_3390_e23030380 crossref_primary_10_1080_10864415_2015_1000216 crossref_primary_10_1080_00401706_2019_1701556 crossref_primary_10_1002_asmb_2355 crossref_primary_10_1016_j_neucom_2021_04_112 crossref_primary_10_1214_17_AOS1569 crossref_primary_10_1016_j_datak_2014_07_003 crossref_primary_10_1287_mksc_2013_0807 crossref_primary_10_2139_ssrn_3294832 crossref_primary_10_1016_j_swevo_2022_101197 crossref_primary_10_1109_TAES_2021_3130832 crossref_primary_10_1287_mksc_2013_0803 crossref_primary_10_1145_3088510 crossref_primary_10_1007_s10614_024_10816_w crossref_primary_10_1038_ncomms5390 crossref_primary_10_1007_s42519_022_00280_w crossref_primary_10_1016_j_mathsocsci_2015_05_002 crossref_primary_10_1080_00401706_2022_2125443 crossref_primary_10_1145_3130911 crossref_primary_10_1509_jmr_14_0117 crossref_primary_10_1186_2194_3206_2_2 crossref_primary_10_1016_j_intmar_2012_02_002 crossref_primary_10_2139_ssrn_2481321 crossref_primary_10_1287_moor_2014_0650 crossref_primary_10_1007_s11004_017_9695_9 crossref_primary_10_1002_asmb_2104 crossref_primary_10_1002_wics_1372 crossref_primary_10_1109_TNET_2020_3025904 crossref_primary_10_2139_ssrn_2045842 crossref_primary_10_25046_aj050428 crossref_primary_10_1109_JPROC_2015_2494218 crossref_primary_10_1111_coin_12325 crossref_primary_10_1016_j_jnca_2018_11_006 crossref_primary_10_1109_JIOT_2022_3142185 crossref_primary_10_1146_annurev_statistics_033121_114601 crossref_primary_10_1007_s12369_016_0362_y crossref_primary_10_1287_mnsc_2015_2153 crossref_primary_10_1177_1094428119854153 crossref_primary_10_1287_mksc_2016_1023 crossref_primary_10_1021_ct500827g crossref_primary_10_1002_widm_1549 crossref_primary_10_1109_TCSI_2023_3302341 crossref_primary_10_1177_00222429241274727 crossref_primary_10_3389_frai_2021_715690 crossref_primary_10_1103_PhysRevResearch_2_033295 crossref_primary_10_1002_sim_8873 crossref_primary_10_2139_ssrn_3495518 crossref_primary_10_1177_2167702620920722 crossref_primary_10_3389_fnhum_2022_931085 crossref_primary_10_1108_EJM_08_2016_0485 crossref_primary_10_1016_j_ijhcs_2015_01_004 crossref_primary_10_1016_j_jet_2021_105203 crossref_primary_10_1287_mnsc_2022_01205 crossref_primary_10_2139_ssrn_3060341 crossref_primary_10_1016_j_socscimed_2018_04_028 crossref_primary_10_2139_ssrn_2893459 crossref_primary_10_2139_ssrn_2000311 crossref_primary_10_1146_annurev_soc_030420_015345 crossref_primary_10_1177_0272989X17716431 crossref_primary_10_1177_2158244019851675 crossref_primary_10_1007_s10506_015_9165_y crossref_primary_10_1111_rssc_12266 crossref_primary_10_1146_annurev_economics_080217_053433 crossref_primary_10_1007_s10489_021_02387_2 crossref_primary_10_1162_opmi_a_00132 crossref_primary_10_1257_jel_20191597 crossref_primary_10_2139_ssrn_2368523 crossref_primary_10_2139_ssrn_3070706 crossref_primary_10_1007_s11277_020_08064_w crossref_primary_10_1007_s10489_012_0346_z crossref_primary_10_1038_s41598_024_54515_w crossref_primary_10_1287_mksc_2021_1346 crossref_primary_10_3758_s13428_014_0480_0 crossref_primary_10_1057_s41272_017_0121_1 crossref_primary_10_3390_a15030081 crossref_primary_10_1016_j_buildenv_2020_107013 crossref_primary_10_1109_JIOT_2019_2957964 crossref_primary_10_2139_ssrn_4755027 crossref_primary_10_1016_j_future_2022_02_008 crossref_primary_10_1016_j_eswa_2023_119677 crossref_primary_10_1007_s13253_023_00551_4 crossref_primary_10_2478_zireb_2018_0004 crossref_primary_10_1016_j_cie_2017_11_030 crossref_primary_10_1214_17_BJPS365 crossref_primary_10_3390_s23031402 crossref_primary_10_1186_s13638_020_01738_w crossref_primary_10_1287_mnsc_2018_3088 crossref_primary_10_1007_s11036_019_01453_x crossref_primary_10_1109_ACCESS_2020_3012593 crossref_primary_10_1080_01621459_2016_1261711 crossref_primary_10_1002_sta4_247 crossref_primary_10_17706_jsw_10_3_239_249 crossref_primary_10_1093_beheco_arac027 crossref_primary_10_3233_IDT_180036 crossref_primary_10_2139_ssrn_3274875 crossref_primary_10_3982_ECTA17985 crossref_primary_10_1080_00401706_2022_2124309 crossref_primary_10_1109_TCSS_2021_3070239 crossref_primary_10_1057_s41272_017_0096_y crossref_primary_10_1007_s00158_022_03256_3 crossref_primary_10_1109_TCOMM_2019_2913413 crossref_primary_10_1016_j_seta_2020_100679 crossref_primary_10_1007_s00779_012_0585_3 crossref_primary_10_2139_ssrn_4323382 crossref_primary_10_2139_ssrn_4118863 crossref_primary_10_1016_j_eswa_2016_03_022 crossref_primary_10_1287_opre_2021_2207 crossref_primary_10_2139_ssrn_4077638 crossref_primary_10_1016_j_asoc_2020_107054 crossref_primary_10_1007_s10898_018_0609_2 crossref_primary_10_1098_rsta_2022_0153 crossref_primary_10_1109_TASE_2020_2969884 crossref_primary_10_2139_ssrn_4508118 crossref_primary_10_1109_TITS_2024_3361666 crossref_primary_10_1287_opre_2017_1663 crossref_primary_10_1523_JNEUROSCI_0718_14_2014 crossref_primary_10_1109_TCST_2015_2508007 crossref_primary_10_1287_mksc_2019_1194 crossref_primary_10_1287_mnsc_2020_3599 crossref_primary_10_1080_00401706_2022_2054861 crossref_primary_10_1017_jpr_2023_24 crossref_primary_10_1287_isre_2019_0902 crossref_primary_10_1287_mnsc_2021_4136 crossref_primary_10_2139_ssrn_2711333 crossref_primary_10_1080_00031305_2021_2000495 crossref_primary_10_1287_moor_2021_1229
Cites_doi	10.1111/1468-0262.00170 10.1016/0196-8858(85)90002-8 10.1023/A:1013689704352 10.1214/aop/1176994469 10.1007/978-94-015-3711-7 10.1093/biomet/25.3-4.285 10.2307/2371219 10.1214/aos/1176348788 10.2307/1427934 10.1080/01621459.1993.10476321 10.1007/978-1-4757-3437-9 10.1073/pnas.42.12.920 10.1201/9781420035834 10.1214/aos/1015362186 10.1214/ss/1177009939 10.1111/j.2517-6161.1979.tb01068.x 10.1214/aos/1176350495 10.1214/aos/1176325750 10.1002/9780470182963 10.2307/3214163 10.1016/S0165-1889(01)00028-8
ContentType	Journal Article
Copyright	Copyright © 2010 John Wiley & Sons, Ltd.
Copyright_xml	– notice: Copyright © 2010 John Wiley & Sons, Ltd.
DBID	BSCLL AAYXX CITATION 7SC 7TA 8FD JG9 JQ2 L7M L~C L~D
DOI	10.1002/asmb.874
DatabaseName	Istex CrossRef Computer and Information Systems Abstracts Materials Business File Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Materials Research Database Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Materials Business File Computer and Information Systems Abstracts Professional
DatabaseTitleList	Materials Research Database CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Mathematics Business
EISSN	1526-4025
EndPage	658
ExternalDocumentID	10_1002_asmb_874 ASMB874 ark_67375_WNG_J0Z227KZ_V
Genre	article
GroupedDBID	.3N .GA .Y3 05W 0R~ 10A 1L6 1OC 23M 31~ 33P 3SF 3WU 4.4 50Y 50Z 51W 51X 52M 52N 52O 52P 52S 52T 52U 52W 52X 5GY 5VS 66C 702 7PT 8-0 8-1 8-3 8-4 8-5 8UM 8VB 930 A03 AAESR AAEVG AAHHS AANLZ AAONW AASGY AAXRX AAZKR ABCQN ABCUV ABEML ABIJN ABJNI ABPVW ACAHQ ACBWZ ACCFJ ACCZN ACGFS ACIWK ACPOU ACSCC ACXBN ACXQS ADBBV ADEOM ADIZJ ADKYN ADMGS ADOZA ADXAS ADZMN ADZOD AEEZP AEIGN AEIMD AEMOZ AENEX AEQDE AEUQT AEUYR AFBPY AFFPM AFGKR AFPWT AFZJQ AHBTC AITYG AIURR AIWBW AJBDE AJXKR AKVCP ALAGY ALMA_UNASSIGNED_HOLDINGS ALUQN AMBMR AMYDB ATUGU AUFTA AZBYB AZFZN AZVAB BAFTC BDRZF BFHJK BHBCM BMNLL BMXJE BNHUX BROTX BRXPI BSCLL BY8 CS3 D-E D-F DCZOG DPXWK DR2 DRFUL DRSTM EBA EBO EBR EBS EBU EJD EMK EPL F00 F01 F04 FEDTE G-S G.N GNP GODZA H.T H.X HF~ HGLYW HHZ HVGLF HZ~ IX1 J0M JPC K1G KQQ LATKE LAW LC2 LC3 LEEKS LH4 LITHE LOXES LP6 LP7 LUTES LW6 LYRES MEWTI MK4 MRFUL MRSTM MSFUL MSSTM MXFUL MXSTM N04 N05 N9A NF~ O66 O9- OIG P2P P2W P2X P4D Q.N Q11 QB0 QRW QWB R.K ROL RWI RX1 RYL SUPJJ TH9 UB1 W8V W99 WBKPD WIH WIK WJL WOHZO WQJ WRC WXSBR WYISQ XBAML XG1 XPP XV2 YHZ ZL0 ~IA ~WT AAHQN AAMNL AANHP AAYCA ACRPL ACYXJ ADNMO AFWVQ AHQJS ALVPJ AAMMB AAYXX AEFGJ AEYWJ AGHNM AGQPQ AGXDD AGYGG AIDQK AIDYY AMVHM CITATION 7SC 7TA 8FD JG9 JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c4024-e9f6742c9db1f126d4a43e7e721d47269548364ff892ad86f4be8ec7486bf8be3
IEDL.DBID	DR2
ISSN	1524-1904 1526-4025
IngestDate	Fri Jul 11 10:50:21 EDT 2025 Sun Jul 06 05:07:37 EDT 2025 Thu Apr 24 22:54:59 EDT 2025 Wed Jan 22 17:08:37 EST 2025 Wed Oct 30 10:00:59 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	6
Language	English
License	http://onlinelibrary.wiley.com/termsAndConditions#vor
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c4024-e9f6742c9db1f126d4a43e7e721d47269548364ff892ad86f4be8ec7486bf8be3
Notes	istex:CA01319168E0DF39B03376CA2173957D710EFA9E ArticleID:ASMB874 ark:/67375/WNG-J0Z227KZ-V ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
PQID	1031289652
PQPubID	23500
PageCount	20
ParticipantIDs	proquest_miscellaneous_1031289652 crossref_citationtrail_10_1002_asmb_874 crossref_primary_10_1002_asmb_874 wiley_primary_10_1002_asmb_874_ASMB874 istex_primary_ark_67375_WNG_J0Z227KZ_V
PublicationCentury	2000
PublicationDate	2010-11 November/December 2010 2010-11-00 20101101
PublicationDateYYYYMMDD	2010-11-01
PublicationDate_xml	– month: 11 year: 2010 text: 2010-11
PublicationDecade	2010
PublicationPlace	Chichester, UK
PublicationPlace_xml	– name: Chichester, UK
PublicationTitle	Applied stochastic models in business and industry
PublicationTitleAlternate	Appl. Stochastic Models Bus. Ind
PublicationYear	2010
Publisher	John Wiley & Sons, Ltd
Publisher_xml	– name: John Wiley & Sons, Ltd
References	Powell WB. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley: New York, 2007. Gittins J, Wang Y-G. The learning component of dynamic allocation indices. The Annals of Statistics 1992; 20:1625-1636. West M, Harrison J. Bayesian Forecasting and Dynamic Models. Springer: Berlin, 1997. Thompson WR. On the theory of apportionment. American Journal of Mathematics 1935; 57(2):450-456. Brezzi M, Lai TL. Incomplete leraning from endogenous data in dynamic allocation. Econometrica 2000; 68(6):1511-1516. Whittle P. Restless bandits: activity allocation in a changing world. Journal of Applied Probability 1988; 25A:287-298. Whittle P. Discussion of 'bandit processes and dynamic allocation indices'. Journal of the Royal Statistical Society, Series B: Methodological 1979; 41:165. Berry DA, Fristedt B. Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall: London, 1985. Chaloner K, Verdinelli I. Bayesian experimental design: a review. Statistical Science 1995; 10:273-304. Doucet A, De Frietas N, Gordon N. Sequential Monte Carlo in Practice. Springer: Berlin, 2001. Cox DR, Reid N. The Theory of the Design of Experiments. Chapman & Hall, CRC: London, Boca Raton, 2000. Lai T-L. Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 1987; 15(3):1091-1114. Yang Y, Zhu D. Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. The Annals of Statistics 2002; 30:100-121. Bellman RE. A problem in the sequential design of experiments. Sankhya Series A 1956; 30:221-252. Whittle P. Arm-acquiring bandits. The Annals of Probability 1981; 9(2):284-292. Agrawal R. Sample mean based index policies with o(logn) regret for the multi-armed bandit problem. Advances in Applied Probability 1995; 27:1054-1078. Brezzi M, Lai TL. Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 2002; 27:87-108. Robbins H. A sequential decision procedure with a finite memory. Proceedings of the National Academy of Science 1956; 42:920-923. Thompson WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 1933; 25:285-294. Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine Learning 2002; 47:235-256. Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 1993; 88:669-679. Gittins JC. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B: Methodological 1979 41:148-177. Lai T-L, Robbins H. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 1985; 6:4-22. Tierney L. Markov chains for exploring posterior distributions (disc: P1728-1762). The Annals of Statistics 1994; 22:1701-1728. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press: Cambridge, MA, 1998. Luce D. Individual Choice Behavior. Wiley: New York, 1959. 2002; 30 2010 1956; 30 2000; 68 1993; 88 1995; 10 1985; 6 1998 1994; 22 1997 2007 1981; 9 1935; 57 1959 1933; 25 1987; 15 2002; 27 2002; 47 1988; 25A 2001 2000 1995; 27 1956; 42 1985 1992; 20 1979; 41 e_1_2_8_28_2 e_1_2_8_24_2 e_1_2_8_25_2 e_1_2_8_26_2 e_1_2_8_9_2 Gittins JC (e_1_2_8_4_2) 1979; 41 e_1_2_8_2_2 e_1_2_8_6_2 e_1_2_8_8_2 e_1_2_8_7_2 e_1_2_8_20_2 e_1_2_8_21_2 Whittle P (e_1_2_8_3_2) 1979; 41 e_1_2_8_22_2 Bellman RE (e_1_2_8_5_2) 1956; 30 e_1_2_8_16_2 e_1_2_8_17_2 e_1_2_8_18_2 e_1_2_8_19_2 e_1_2_8_13_2 e_1_2_8_15_2 Luce D (e_1_2_8_23_2) 1959 e_1_2_8_10_2 e_1_2_8_11_2 Sutton RS (e_1_2_8_14_2) 1998 Doucet A (e_1_2_8_12_2) 2001 West M (e_1_2_8_27_2) 1997
References_xml	– reference: West M, Harrison J. Bayesian Forecasting and Dynamic Models. Springer: Berlin, 1997. – reference: Brezzi M, Lai TL. Incomplete leraning from endogenous data in dynamic allocation. Econometrica 2000; 68(6):1511-1516. – reference: Powell WB. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley: New York, 2007. – reference: Gittins JC. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B: Methodological 1979 41:148-177. – reference: Gittins J, Wang Y-G. The learning component of dynamic allocation indices. The Annals of Statistics 1992; 20:1625-1636. – reference: Luce D. Individual Choice Behavior. Wiley: New York, 1959. – reference: Yang Y, Zhu D. Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. The Annals of Statistics 2002; 30:100-121. – reference: Bellman RE. A problem in the sequential design of experiments. Sankhya Series A 1956; 30:221-252. – reference: Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 1993; 88:669-679. – reference: Whittle P. Restless bandits: activity allocation in a changing world. Journal of Applied Probability 1988; 25A:287-298. – reference: Robbins H. A sequential decision procedure with a finite memory. Proceedings of the National Academy of Science 1956; 42:920-923. – reference: Cox DR, Reid N. The Theory of the Design of Experiments. Chapman & Hall, CRC: London, Boca Raton, 2000. – reference: Doucet A, De Frietas N, Gordon N. Sequential Monte Carlo in Practice. Springer: Berlin, 2001. – reference: Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press: Cambridge, MA, 1998. – reference: Chaloner K, Verdinelli I. Bayesian experimental design: a review. Statistical Science 1995; 10:273-304. – reference: Whittle P. Arm-acquiring bandits. The Annals of Probability 1981; 9(2):284-292. – reference: Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine Learning 2002; 47:235-256. – reference: Brezzi M, Lai TL. Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control 2002; 27:87-108. – reference: Lai T-L, Robbins H. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 1985; 6:4-22. – reference: Lai T-L. Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 1987; 15(3):1091-1114. – reference: Agrawal R. Sample mean based index policies with o(logn) regret for the multi-armed bandit problem. Advances in Applied Probability 1995; 27:1054-1078. – reference: Whittle P. Discussion of 'bandit processes and dynamic allocation indices'. Journal of the Royal Statistical Society, Series B: Methodological 1979; 41:165. – reference: Thompson WR. On the theory of apportionment. American Journal of Mathematics 1935; 57(2):450-456. – reference: Berry DA, Fristedt B. Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall: London, 1985. – reference: Thompson WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 1933; 25:285-294. – reference: Tierney L. Markov chains for exploring posterior distributions (disc: P1728-1762). The Annals of Statistics 1994; 22:1701-1728. – volume: 27 start-page: 87 year: 2002 end-page: 108 article-title: Optimal learning and experimentation in bandit problems publication-title: Journal of Economic Dynamics and Control – volume: 20 start-page: 1625 year: 1992 end-page: 1636 article-title: The learning component of dynamic allocation indices publication-title: The Annals of Statistics – year: 1985 – volume: 25 start-page: 285 year: 1933 end-page: 294 article-title: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples publication-title: Biometrika – volume: 88 start-page: 669 year: 1993 end-page: 679 article-title: Bayesian analysis of binary and polychotomous response data publication-title: Journal of the American Statistical Association – volume: 30 start-page: 221 year: 1956 end-page: 252 article-title: A problem in the sequential design of experiments publication-title: Sankhya Series A – volume: 6 start-page: 4 year: 1985 end-page: 22 article-title: Asymptotically efficient adaptive allocation rules publication-title: Advances in Applied Mathematics – volume: 57 start-page: 450 issue: 2 year: 1935 end-page: 456 article-title: On the theory of apportionment publication-title: American Journal of Mathematics – volume: 30 start-page: 100 year: 2002 end-page: 121 article-title: Randomized allocation with nonparametric estimation for a multi‐armed bandit problem with covariates publication-title: The Annals of Statistics – year: 2001 – year: 2007 – year: 2000 – volume: 47 start-page: 235 year: 2002 end-page: 256 article-title: Finite‐time analysis of the multiarmed bandit problem publication-title: Machine Learning – volume: 15 start-page: 1091 issue: 3 year: 1987 end-page: 1114 article-title: Adaptive treatment allocation and the multi‐armed bandit problem publication-title: The Annals of Statistics – volume: 22 start-page: 1701 year: 1994 end-page: 1728 article-title: Markov chains for exploring posterior distributions (disc: P1728–1762) publication-title: The Annals of Statistics – volume: 10 start-page: 273 year: 1995 end-page: 304 article-title: Bayesian experimental design: a review publication-title: Statistical Science – year: 2010 – year: 1998 – volume: 42 start-page: 920 year: 1956 end-page: 923 article-title: A sequential decision procedure with a finite memory publication-title: Proceedings of the National Academy of Science – volume: 27 start-page: 1054 year: 1995 end-page: 1078 article-title: Sample mean based index policies with (log ) regret for the multi‐armed bandit problem publication-title: Advances in Applied Probability – year: 1959 – volume: 41 start-page: 165 year: 1979 article-title: Discussion of ‘bandit processes and dynamic allocation indices’ publication-title: Journal of the Royal Statistical Society, Series B: Methodological – volume: 41 start-page: 148 year: 1979 end-page: 177 article-title: Bandit processes and dynamic allocation indices publication-title: Journal of the Royal Statistical Society, Series B: Methodological – volume: 68 start-page: 1511 issue: 6 year: 2000 end-page: 1516 article-title: Incomplete leraning from endogenous data in dynamic allocation publication-title: Econometrica – year: 1997 – volume: 9 start-page: 284 issue: 2 year: 1981 end-page: 292 article-title: Arm‐acquiring bandits publication-title: The Annals of Probability – volume: 25A start-page: 287 year: 1988 end-page: 298 article-title: Restless bandits: activity allocation in a changing world publication-title: Journal of Applied Probability – ident: e_1_2_8_2_2 – ident: e_1_2_8_6_2 doi: 10.1111/1468-0262.00170 – ident: e_1_2_8_18_2 doi: 10.1016/0196-8858(85)90002-8 – ident: e_1_2_8_21_2 doi: 10.1023/A:1013689704352 – volume-title: Bayesian Forecasting and Dynamic Models year: 1997 ident: e_1_2_8_27_2 – ident: e_1_2_8_28_2 doi: 10.1214/aop/1176994469 – ident: e_1_2_8_17_2 doi: 10.1007/978-94-015-3711-7 – volume-title: Reinforcement Learning: An Introduction year: 1998 ident: e_1_2_8_14_2 – ident: e_1_2_8_7_2 doi: 10.1093/biomet/25.3-4.285 – ident: e_1_2_8_8_2 doi: 10.2307/2371219 – volume: 30 start-page: 221 year: 1956 ident: e_1_2_8_5_2 article-title: A problem in the sequential design of experiments publication-title: Sankhya Series A – volume: 41 start-page: 165 year: 1979 ident: e_1_2_8_3_2 article-title: Discussion of ‘bandit processes and dynamic allocation indices’ publication-title: Journal of the Royal Statistical Society, Series B: Methodological – ident: e_1_2_8_16_2 doi: 10.1214/aos/1176348788 – ident: e_1_2_8_20_2 doi: 10.2307/1427934 – ident: e_1_2_8_25_2 doi: 10.1080/01621459.1993.10476321 – volume-title: Sequential Monte Carlo in Practice year: 2001 ident: e_1_2_8_12_2 doi: 10.1007/978-1-4757-3437-9 – ident: e_1_2_8_22_2 doi: 10.1073/pnas.42.12.920 – ident: e_1_2_8_9_2 doi: 10.1201/9781420035834 – ident: e_1_2_8_24_2 doi: 10.1214/aos/1015362186 – ident: e_1_2_8_10_2 doi: 10.1214/ss/1177009939 – volume: 41 start-page: 148 year: 1979 ident: e_1_2_8_4_2 article-title: Bandit processes and dynamic allocation indices publication-title: Journal of the Royal Statistical Society, Series B: Methodological doi: 10.1111/j.2517-6161.1979.tb01068.x – ident: e_1_2_8_19_2 doi: 10.1214/aos/1176350495 – ident: e_1_2_8_11_2 doi: 10.1214/aos/1176325750 – ident: e_1_2_8_13_2 doi: 10.1002/9780470182963 – volume-title: Individual Choice Behavior year: 1959 ident: e_1_2_8_23_2 – ident: e_1_2_8_26_2 doi: 10.2307/3214163 – ident: e_1_2_8_15_2 doi: 10.1016/S0165-1889(01)00028-8
SSID	ssj0011471
Score	2.3873754
Snippet	A multi‐armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned... A multi-armed bandit is an experiment with the goal of accumulating rewards from a payoff distribution with unknown parameters that are to be learned...
SourceID	proquest crossref wiley istex
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	639
SubjectTerms	Bayesian adaptive design Bayesian analysis Business exploration vs exploitation Flexibility Heuristic Matching Mathematical models Optimization probability matching Reinforcement sequential design
Title	A modern Bayesian look at the multi-armed bandit
URI	https://api.istex.fr/ark:/67375/WNG-J0Z227KZ-V/fulltext.pdf https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fasmb.874 https://www.proquest.com/docview/1031289652
Volume	26
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NTtwwELYquLSHFlpQ-Slypao9Zck6juMcd6suiGo5tKVFcLDGjnNZWKrdrASceASekSfpjLMJPwIJ9ZTLxElm7Mw39sw3jH2i7uUaZIwWUPUxY2S7SkUahEtc4jUkVO883Fe7B3LvMD2cZ1VSLUzND9FuuNHKCP9rWuBgp9u3pKEwPbUdnREVKKVqER760TJHIcqvY61UyAh9nmx4Z2Ox3dx4zxMtklLP78HMu2A1eJvBG3bcvGedZDLqzCrbcZcPKBz_70OW2Os5COW9etYssxd-_Ja9GrYMrtN3TPT4aWiTxvtw4anSkp8gHudQcZTiIQ_x5uoaJuhNuaXamGqFHQy-_fq6G837K0QOo0YZ-bxUGBm7vLDdsitUIUEmPvMYFBYyE4q44BIly1LnAgqtSmm99i6TWtlSW5-ssoXx2di_ZzzNrI5TiAHAySIvQfiusy7HQXwWx-Ua-9Lo2rg5-Tj1wDgxNW2yMKQFg1pYYx9byb814cYjMp-DuVoBmIwoQS1LzZ_9HbMXHwmRfT8yv3Gwxp4Glw2dhcDYn82mhrpbYKypUoGDBes8-TTT-zns43X9uYIb7GXIMQgVi5tsoZrM_AeELpXdCpP0Hxjv6vg
linkProvider	Wiley-Blackwell
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NTtwwEB5ROLQcSoEioNC6EiqnLFnHcRxxWlDp8rN7KNCiFZJlO86F3QXtZqWWE4_AM_IkjJ1NWlArVT3lMpkkM57MjD3zDcCWm14uFAtRA7w8Zgx0k_NAKGoiE1mhItfv3Ony9jk7uogvZmC36oUp8SHqDTdnGf5_7QzcbUjv_EINVeOBboiEvYA5N9Db51Nfa-wojPPLbCumLECvxyrk2ZDuVHc-8UVzTqw_ngSav4er3t8cLMBl9aZlmclVY1Lohrl9BuL4n5_yBl5P41DSKhfOIszY4RLMd2oQ1_Ey0BYZ-ElpZE_9tK7ZkvQxJCeqIEhFfCniw929GqFDJdq1xxRv4fzg89l-O5iOWAgMJo4ssGnOMTk2aaabeZPyjCkW2cRiXpixhHIHBxdxlucipSoTPGfaCmsSJrjOhbbRCswOr4d2FUicaBHGKlRKGZaluaK2abRJkYlNwjBfg-1K2NJM8cfdGIy-LJGTqXRSkCiFNfhYU96UmBt_oPnk9VUTqNGVq1FLYvm9-0UehT1Kk-Oe_IbMKoVKtBx3HKKG9noylm7ABaabPKbIzKvnr0-TrdPOHl7X_5XwA7xsn3VO5Mlh9_gdvPIlB76BcQNmi9HEbmIkU-j3fsU-AsHc7xM
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwELYoSKgceJWKN0ZC9JQl6ziOc1wey3NXCEqL6MGyHfsCLGg3KwEnfgK_kV_C2NkEqFoJccplMklmxpkZe-YbhNbc9HIuaQgaYMUxY6DqjAVcEh3pyHAZuX7nVpvtndGD8_h8UFXpemEKfIhqw82tDP-_dgv8NrMbr6ChsnetajyhX9AIZSF3Fr19UkFHQZhfJFsxoQE4PVoCz4Zko7zznSsacVK9exdnvo1WvbtpTqA_5YsWVSaXtX6uavrhLwzHz33JJBofRKG4UZjNFBoynWk01qogXHvfEGngaz8nDW_Ke-NaLfEVBORY5hiosC9EfH58kl1wp1i55ph8Bp01d35u7QWDAQuBhrSRBia1DFJjnWaqbuuEZVTSyCQGssKMJoQ5MLiIUWt5SmTGmaXKcKMTypmyXJnoOxru3HTMLMJxongYy1BKqWmWWklMXSudAhOThKGdQz9KWQs9QB93QzCuRIGbTISTggApzKHVivK2QNz4B826V1dFILuXrkIticXv9q44CC8ISQ4vxC9gVupTwLpxhyGyY276PeHGW0CyyWICzLx2_vs00ThtbcJ1_qOEK2j0eLspjvbbhwvoq6838N2Li2g47_bNEoQxuVr29voCJWntyw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+modern+Bayesian+look+at+the+multi%E2%80%90armed+bandit&rft.jtitle=Applied+stochastic+models+in+business+and+industry&rft.au=Scott%2C+Steven+L.&rft.date=2010-11-01&rft.pub=John+Wiley+%26+Sons%2C+Ltd&rft.issn=1524-1904&rft.eissn=1526-4025&rft.volume=26&rft.issue=6&rft.spage=639&rft.epage=658&rft_id=info:doi/10.1002%2Fasmb.874&rft.externalDBID=10.1002%252Fasmb.874&rft.externalDocID=ASMB874
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1524-1904&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1524-1904&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1524-1904&client=summon