Bandit Learning with Predicted Context: Regret Analysis and Selective Context Query

Contextual bandit learning selects actions (i.e., arms) based on context information to maximize rewards while balancing exploitation and exploration. In many applications (e.g., cloud resource management with dynamic workloads), before arm selection, the agent/learner can either predict context inf...

Full description

Saved in:
Bibliographic Details
Published inAnnual Joint Conference of the IEEE Computer and Communications Societies pp. 1 - 10
Main Authors Yang, Jianyi, Ren, Shaolei
Format Conference Proceeding
LanguageEnglish
Published IEEE 10.05.2021
Subjects
Online AccessGet full text
ISSN2641-9874
DOI10.1109/INFOCOM42981.2021.9488896

Cover

Loading…
More Information
Summary:Contextual bandit learning selects actions (i.e., arms) based on context information to maximize rewards while balancing exploitation and exploration. In many applications (e.g., cloud resource management with dynamic workloads), before arm selection, the agent/learner can either predict context information online based on context history or selectively query the context from an outside expert. Motivated by this practical consideration, we study a novel contextual bandit setting where context information is either predicted online or queried from an expert. First, considering predicted context only, we quantify the impact of context prediction on the cumulative regret (compared to an oracle with perfect context information) by deriving an upper bound on regret, which takes the form of a weighted combination of regret incurred by standard bandit learning and the context prediction error. Then, inspired by the regret's structural decomposition, we propose context query algorithms to selectively obtain outside expert's input (subject to a total query budget) for more accurate context, decreasing the overall regret. Finally, we apply our algorithms to virtual machine scheduling on cloud platforms. The simulation results validate our regret analysis and shows the effectiveness of our selective context query algorithms.
ISSN:2641-9874
DOI:10.1109/INFOCOM42981.2021.9488896