CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption

We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instea...

Full description

Saved in:

Bibliographic Details
Main Authors	Agrawal, Shubhada, Mathieu, Timothée, Basu, Debabrota, Maillard, Odalric-Ambrym
Format	Journal Article
Language	English
Published	28.09.2023
Subjects	Computer Science - Learning Statistics - Machine Learning
Online Access	Get full text

Cover

Loading…

Abstract	We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
AbstractList	We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
Author	Mathieu, Timothée Maillard, Odalric-Ambrym Basu, Debabrota Agrawal, Shubhada
Author_xml	– sequence: 1 givenname: Shubhada surname: Agrawal fullname: Agrawal, Shubhada – sequence: 2 givenname: Timothée surname: Mathieu fullname: Mathieu, Timothée – sequence: 3 givenname: Debabrota surname: Basu fullname: Basu, Debabrota – sequence: 4 givenname: Odalric-Ambrym surname: Maillard fullname: Maillard, Odalric-Ambrym
BackLink	https://doi.org/10.48550/arXiv.2309.16563$$DView paper in arXiv
BookMark	eNotj7FOwzAYhD3AAIUHYMIvkODYsR2zQSilUhBSaQemyLH_UEtgR45L4e1JC9Od7k4nfefoxAcPCF0VJC8rzsmNjt_uK6eMqLwQXLAz9Favls_zh1vchD1ErL3Fm2GY3H3YeTvi4PEK3iMk3IcpnHqXRrx3aYs3vjtswOLXFMxWj8kZXIcYd0NywV-g015_jHD5rzO0fpyv66eseVks67sm00KyTJTckk5yWwgDFDghoHoplQRFNeuIqpjlxgiwlgGvOKiqN1RQoSwBUlI2Q9d_t0e2dojuU8ef9sDYHhnZL0ZDTa0
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY EPD GOX
DOI	10.48550/arxiv.2309.16563
DatabaseName	arXiv Computer Science arXiv Statistics arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2309_16563
GroupedDBID	AKY EPD GOX
ID	FETCH-LOGICAL-a673-645d0b75d16ce2e500e9f7797e92a3b0983d5cc6edd3e585e98fc26269d0e0423
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:43:07 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a673-645d0b75d16ce2e500e9f7797e92a3b0983d5cc6edd3e585e98fc26269d0e0423
OpenAccessLink	https://arxiv.org/abs/2309.16563
ParticipantIDs	arxiv_primary_2309_16563
PublicationCentury	2000
PublicationDate	2023-09-28
PublicationDateYYYYMMDD	2023-09-28
PublicationDate_xml	– month: 09 year: 2023 text: 2023-09-28 day: 28
PublicationDecade	2020
PublicationYear	2023
Score	1.9007179
SecondaryResourceType	preprint
Snippet	We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Learning Statistics - Machine Learning
Title	CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption
URI	https://arxiv.org/abs/2309.16563
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwELXaTiwIBKh8ygOrwY2dD7PR0lIQBam0Upkqn30RLEmVpoifj-0UwcJm2W-6KLp39t17hFwCB6kyKZgEMExqgQx6WjOdJRlGmBoe5rgnz8l4Lh8X8aJF6M8sjK6-Pj4bfWBYXzt-rK68Poxok3YU-Zat-5dF8zgZpLi2-F-c45hh60-SGO2R3S27o7fN59gnLSwOyNtg-jAZ3t3QJ-9IRl3pTuerlVv1vaXRmpYFnaIre2vqCCTt-zmTek39BSmdF-AxaOlrXZp37UWV6aCsqk340w_JbDScDcZs62jAdJIKlsjYckhj2_M-XBhzjipPU5WiirQArjJhY2MStFag4_GostxEruRQlqNvYDkinaIssEuozh06twiQJ9IAuqQCNrfK1QfArVXHpBvisFw1ohVLH6JlCNHJ_0enZMfbqft-iCg7I5262uC5S7o1XITIfwPy-IMj
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=CRIMED%3A+Lower+and+Upper+Bounds+on+Regret+for+Bandits+with+Unbounded+Stochastic+Corruption&rft.au=Agrawal%2C+Shubhada&rft.au=Mathieu%2C+Timoth%C3%A9e&rft.au=Basu%2C+Debabrota&rft.au=Maillard%2C+Odalric-Ambrym&rft.date=2023-09-28&rft_id=info:doi/10.48550%2Farxiv.2309.16563&rft.externalDocID=2309_16563