Global and Local Convergence Analysis of a Bandit Learning Algorithm in Merely Coherent Games

Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to path planning of self-driving vehicles. Although most existing solution algo...

Full description

Saved in:

Bibliographic Details
Published in	IEEE Open Journal of Control Systems Vol. 2; pp. 366 - 379
Main Authors	Huang, Yuanhanqing, Hu, Jianghai
Format	Journal Article
Language	English
Published	IEEE 2023
Subjects	Coherence Control systems Convergence Game theory Games Heuristic algorithms learning theory Linear programming Mirrors optimization under uncertainties stochastic systems
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to path planning of self-driving vehicles. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this article, we devise a bandit online learning algorithm that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further prove that the generated actual sequence of play converges a.s. to a critical point if the game under study is globally merely coherent, without resorting to extra Tikhonov regularization terms or additional norm conditions. We also discuss the convergence properties of the proposed bandit learning algorithm in locally merely coherent games. Finally, we illustrate the validity of the proposed algorithm via two two-player minimax problems and a cognitive radio bandwidth allocation game.
ISSN:	2694-085X 2694-085X
DOI:	10.1109/OJCSYS.2023.3316071