High dimensional stochastic linear contextual bandit with missing covariates

Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making setting. Even with fully observed contexts, there are technical challenges that hinder the application of existing lasso convergence theory: 1) proving the restricted eigenvalue condition under conditi...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Jang, Byoungwook, Nepper, Julia, Chevrette, Marc, Handelsman, Jo, Hero, Alfred O
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 22.07.2022
Subjects	Algorithms Context Convergence Decision making Decision theory Design of experiments Eigenvalues Gene expression Probability theory Random noise Upper bounds
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making setting. Even with fully observed contexts, there are technical challenges that hinder the application of existing lasso convergence theory: 1) proving the restricted eigenvalue condition under conditionally sub-Gaussian noise and 2) accounting for the dependence between the context variables and the chosen actions. This paper studies the effect of missing covariates on regret for stochastic linear bandit algorithms. Our work provides a high-probability upper bound on the regret incurred by the proposed algorithm in terms of covariate sampling probabilities, showing that the regret degrades due to missingness by at most \(\zeta_{min}^2\), where \(\zeta_{min}\) is the minimum probability of observing covariates in the context vector. We illustrate our algorithm for the practical application of experimental design for collecting gene expression data by a sequential selection of class discriminating DNA probes.
ISSN:	2331-8422