Global and Local Convergence Analysis of a Bandit Learning Algorithm in Merely Coherent Games
Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to path planning of self-driving vehicles. Although most existing solution algo...
Saved in:
Published in | IEEE Open Journal of Control Systems Vol. 2; pp. 366 - 379 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
IEEE
2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to path planning of self-driving vehicles. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this article, we devise a bandit online learning algorithm that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further prove that the generated actual sequence of play converges a.s. to a critical point if the game under study is globally merely coherent, without resorting to extra Tikhonov regularization terms or additional norm conditions. We also discuss the convergence properties of the proposed bandit learning algorithm in locally merely coherent games. Finally, we illustrate the validity of the proposed algorithm via two two-player minimax problems and a cognitive radio bandwidth allocation game. |
---|---|
ISSN: | 2694-085X 2694-085X |
DOI: | 10.1109/OJCSYS.2023.3316071 |