Global and Local Convergence Analysis of a Bandit Learning Algorithm in Merely Coherent Games

Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to path planning of self-driving vehicles. Although most existing solution algo...

Full description

Saved in:
Bibliographic Details
Published inIEEE Open Journal of Control Systems Vol. 2; pp. 366 - 379
Main Authors Huang, Yuanhanqing, Hu, Jianghai
Format Journal Article
LanguageEnglish
Published IEEE 2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to path planning of self-driving vehicles. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this article, we devise a bandit online learning algorithm that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further prove that the generated actual sequence of play converges a.s. to a critical point if the game under study is globally merely coherent, without resorting to extra Tikhonov regularization terms or additional norm conditions. We also discuss the convergence properties of the proposed bandit learning algorithm in locally merely coherent games. Finally, we illustrate the validity of the proposed algorithm via two two-player minimax problems and a cognitive radio bandwidth allocation game.
ISSN:2694-085X
2694-085X
DOI:10.1109/OJCSYS.2023.3316071