Distributed Bandit Online Convex Optimization With Time-Varying Coupled Inequality Constraints

Distributed bandit online convex optimization with time-varying coupled inequality constraints is considered, motivated by a repeated game between a group of learners and an adversary. The learners attempt to minimize a sequence of global loss functions and at the same time satisfy a sequence of cou...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on automatic control Vol. 66; no. 10; pp. 4620 - 4635
Main Authors	Yi, Xinlei, Li, Xiuxian, Yang, Tao, Xie, Lihua, Chai, Tianyou, Johansson, Karl Henrik
Format	Journal Article
Language	English
Published	New York IEEE 01.10.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Approximation algorithms Bandit convex optimization Computational geometry Convex analysis Convex functions Convexity distributed optimization Electric power grids Feedback Games gradient approximation Heuristic algorithms Numerical simulation online optimization Optimization Optimization methods Tightness Time factors time-varying constraints
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Distributed bandit online convex optimization with time-varying coupled inequality constraints is considered, motivated by a repeated game between a group of learners and an adversary. The learners attempt to minimize a sequence of global loss functions and at the same time satisfy a sequence of coupled constraint functions, where the constraints are coupled across the distributed learners at each round. The global loss and the coupled constraint functions are the sum of local convex loss and constraint functions, respectively, which are adaptively generated by the adversary. The local loss and constraint functions are revealed in a bandit manner, i.e., only the values of loss and constraint functions are revealed to the learners at the sampling instance, and the revealed function values are held privately by each learner. Both one- and two-point bandit feedback are studied with the two corresponding distributed bandit online algorithms used by the learners. We show that sublinear expected regret and constraint violation are achieved by these two algorithms, if the accumulated variation of the comparator sequence also grows sublinearly. In particular, we show that <inline-formula><tex-math notation="LaTeX">\mathcal {O}(T^{\theta })</tex-math></inline-formula> expected static regret and <inline-formula><tex-math notation="LaTeX">\mathcal {O}(T^{7/4-\theta })</tex-math></inline-formula> constraint violation are achieved in the one-point bandit feedback setting, and <inline-formula><tex-math notation="LaTeX">\mathcal {O}(T^{\max \lbrace \kappa,1-\kappa \rbrace })</tex-math></inline-formula> expected static regret and <inline-formula><tex-math notation="LaTeX">\mathcal {O}(T^{1-\kappa /2})</tex-math></inline-formula> constraint violation in the two-point bandit feedback setting, where <inline-formula><tex-math notation="LaTeX">\theta \in (3/4,5/6]</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">\kappa \in (0,1)</tex-math></inline-formula> are user-defined tradeoff parameters. Finally, the tightness of the theoretical results is illustrated by numerical simulations of a simple power grid example, which also compares the proposed algorithms to algorithms existing in the literature.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9286 1558-2523 1558-2523
DOI:	10.1109/TAC.2020.3030883