Bandit Learning with Predicted Context: Regret Analysis and Selective Context Query

Contextual bandit learning selects actions (i.e., arms) based on context information to maximize rewards while balancing exploitation and exploration. In many applications (e.g., cloud resource management with dynamic workloads), before arm selection, the agent/learner can either predict context inf...

Full description

Saved in:
Bibliographic Details
Published inAnnual Joint Conference of the IEEE Computer and Communications Societies pp. 1 - 10
Main Authors Yang, Jianyi, Ren, Shaolei
Format Conference Proceeding
LanguageEnglish
Published IEEE 10.05.2021
Subjects
Online AccessGet full text
ISSN2641-9874
DOI10.1109/INFOCOM42981.2021.9488896

Cover

Abstract Contextual bandit learning selects actions (i.e., arms) based on context information to maximize rewards while balancing exploitation and exploration. In many applications (e.g., cloud resource management with dynamic workloads), before arm selection, the agent/learner can either predict context information online based on context history or selectively query the context from an outside expert. Motivated by this practical consideration, we study a novel contextual bandit setting where context information is either predicted online or queried from an expert. First, considering predicted context only, we quantify the impact of context prediction on the cumulative regret (compared to an oracle with perfect context information) by deriving an upper bound on regret, which takes the form of a weighted combination of regret incurred by standard bandit learning and the context prediction error. Then, inspired by the regret's structural decomposition, we propose context query algorithms to selectively obtain outside expert's input (subject to a total query budget) for more accurate context, decreasing the overall regret. Finally, we apply our algorithms to virtual machine scheduling on cloud platforms. The simulation results validate our regret analysis and shows the effectiveness of our selective context query algorithms.
AbstractList Contextual bandit learning selects actions (i.e., arms) based on context information to maximize rewards while balancing exploitation and exploration. In many applications (e.g., cloud resource management with dynamic workloads), before arm selection, the agent/learner can either predict context information online based on context history or selectively query the context from an outside expert. Motivated by this practical consideration, we study a novel contextual bandit setting where context information is either predicted online or queried from an expert. First, considering predicted context only, we quantify the impact of context prediction on the cumulative regret (compared to an oracle with perfect context information) by deriving an upper bound on regret, which takes the form of a weighted combination of regret incurred by standard bandit learning and the context prediction error. Then, inspired by the regret's structural decomposition, we propose context query algorithms to selectively obtain outside expert's input (subject to a total query budget) for more accurate context, decreasing the overall regret. Finally, we apply our algorithms to virtual machine scheduling on cloud platforms. The simulation results validate our regret analysis and shows the effectiveness of our selective context query algorithms.
Author Yang, Jianyi
Ren, Shaolei
Author_xml – sequence: 1
  givenname: Jianyi
  surname: Yang
  fullname: Yang, Jianyi
  organization: University of California,Riverside
– sequence: 2
  givenname: Shaolei
  surname: Ren
  fullname: Ren, Shaolei
  organization: University of California,Riverside
BookMark eNo1kM1OAjEUhavRRECewE19gMHbn2l73eFElARFRddkaC9Yg8XM1B_eXhJxdRbnO9_idNlR2iRi7FzAQAjAi_H9aFpN77REJwYSpBigds6hOWB9tE4YU2pQspSHrCONFgU6q09Yt23fAMBZaTpsdlWnEDOfUN2kmFb8O-ZX_tBQiD5T4NUmZfrJl_yJVg1lPkz1etvGlu9mfEZr8jl-0T_GHz-p2Z6y42W9bqm_zx57GV0_V7fFZHozroaTIkpQuZAetNXo9UJ6gQKVtQpCrTE47xYBSh-CMVJqADS469B7pUrlbLBiWSrVY2d_3khE848mvtfNdr7_QP0CXm9SxQ
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/INFOCOM42981.2021.9488896
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Digtal Library (IEEE/IET Electronic Library-IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781665403252
166540325X
EISSN 2641-9874
EndPage 10
ExternalDocumentID 9488896
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-2c04749c4b2c191937730da49d8c8bd05cdd66224009697309cc335387d71f533
IEDL.DBID RIE
IngestDate Wed Aug 27 02:39:50 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-2c04749c4b2c191937730da49d8c8bd05cdd66224009697309cc335387d71f533
PageCount 10
ParticipantIDs ieee_primary_9488896
PublicationCentury 2000
PublicationDate 2021-May-10
PublicationDateYYYYMMDD 2021-05-10
PublicationDate_xml – month: 05
  year: 2021
  text: 2021-May-10
  day: 10
PublicationDecade 2020
PublicationTitle Annual Joint Conference of the IEEE Computer and Communications Societies
PublicationTitleAbbrev INFOCOM
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008726
Score 2.1738868
Snippet Contextual bandit learning selects actions (i.e., arms) based on context information to maximize rewards while balancing exploitation and exploration. In many...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Cloud computing
Conferences
Dynamic scheduling
Prediction algorithms
Simulation
Upper bound
Virtual machining
Title Bandit Learning with Predicted Context: Regret Analysis and Selective Context Query
URI https://ieeexplore.ieee.org/document/9488896
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFD5sexB98bKJdyL4aLq1SZvER4dDhW3qHOxtrEkqImwyWlB_vSdZNy_44Ftpc0hI0nNJzvcdgDO0ajpMQ0aFnQjKjWJUMa2pTK3Nwgx1onHY4W4vuR7y21E8qsD5CgtjrfXJZzZwj_4u38x04Y7Kmgp3m1RJFaq4zRZYrZXWlSJK1uC05NBs3vQ6_Xa_i9pWuigwCoNS-EcVFW9EOpvQXXa_yB15CYo8DfTHL2bG_45vCxpfcD1ytzJE21Cx0x3Y-MY0WIfBpYOv5KSkU30i7vwVhdw1DfqcxJNUveUX5MFiAJ6TJVcJQTEy8LVyUC0um5H7ws7fGzDsXD22r2lZT4E-Ry2W00i3uOBK8zTSGKahY4K_t5lwZaSWqWnF2pgk8VmlKlH4TWnNGGpEYUSYoV-4C7XpbGr3gIRZKpmxaOkmIY90LDMeGWYFT2KjjBD7UHfTM35dUGaMy5k5-Pv1Iay7JaKeFPUIavm8sMdo6_P0xC_yJ2gpqOQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bT8IwFD5BTLy8eAHj3Zr4aIFt3dr6KJGAMlCBhDfC2s4YEzBkS9Rf7-kYeIkPvi3bTtK02_nOac_3HYALRDXlRI5HuRlzyrT0qPSUoiIyJnZi9InacofDTtAcsNuhPyzA5ZILY4zJis9MxV5mZ_l6qlK7VVaV-LUJGazAKuI-8-dsraXfFdwN1uA8V9GstjqNbr0bor8VNg90nUpu_qOPSgYjjS0IFwOYV4-8VNIkqqiPX9qM_x3hNpS_CHvkfglFO1Awk13Y_KY1WILetSWwJCQXVH0idgcWjexBDUadJJOpekuuyKPBFDwhC7USgmakl3XLQce4eI08pGb2XoZB46Zfb9K8owJ9dmteQl1VY5xJxSJXYaKGoQn-4HrMpBZKRLrmK62DIKsrlYHEZ1Ipz0OfyDV3YowM96A4mU7MPhAnjoSnDWLd2GGu8kXMXO0ZzgJfS835AZTs9Ixe56IZo3xmDv--fQbrzX7YHrVbnbsj2LDLRTOJ1GMoJrPUnCDyJ9FptuCfcZqsMQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Annual+Joint+Conference+of+the+IEEE+Computer+and+Communications+Societies&rft.atitle=Bandit+Learning+with+Predicted+Context%3A+Regret+Analysis+and+Selective+Context+Query&rft.au=Yang%2C+Jianyi&rft.au=Ren%2C+Shaolei&rft.date=2021-05-10&rft.pub=IEEE&rft.eissn=2641-9874&rft.spage=1&rft.epage=10&rft_id=info:doi/10.1109%2FINFOCOM42981.2021.9488896&rft.externalDocID=9488896