Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations

In this paper we present an online adaptive control algorithm based on policy iteration reinforcement learning techniques to solve the continuous-time (CT) multi player non-zero-sum (NZS) game with infinite horizon for linear and nonlinear systems. NZS games allow for players to have a cooperative t...

Full description

Saved in:

Bibliographic Details
Published in	Automatica (Oxford) Vol. 47; no. 8; pp. 1556 - 1569
Main Authors	Vamvoudakis, Kyriakos G., Lewis, Frank L.
Format	Journal Article
Language	English
Published	Kidlington Elsevier Ltd 01.08.2011 Elsevier
Subjects	Adaptative systems Adaptive control Adaptive optimal control Algorithms Applied sciences Computer science; control theory; systems Control theory. Systems Coupled Hamilton–Jacobi equations Coupled Riccati equations Exact sciences and technology Game theory Games Mathematical analysis Multi-player games Nash equilibrium Networks On-line systems Online Operational research and scientific management Operational research. Management science Optimal control Persistence of excitation Players Coupled Hamilton–Jacobi equations Nash equilibrium Multi-player games Adaptive optimal control Persistence of excitation Coupled Riccati equations Adaptive algorithm Optimal algorithm Reinforcement learning Non linear control Zero sum game Continuous time Non linear system Real time Adaptive control Closed feedback Persistence Minimum time Coupled Hamilton-Jacobi equations Hamilton Jacobi equation Optimal control Non zero sum game Optimal approximation Infinite horizon Riccati equation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper we present an online adaptive control algorithm based on policy iteration reinforcement learning techniques to solve the continuous-time (CT) multi player non-zero-sum (NZS) game with infinite horizon for linear and nonlinear systems. NZS games allow for players to have a cooperative team component and an individual selfish component of strategy. The adaptive algorithm learns online the solution of coupled Riccati equations and coupled Hamilton–Jacobi equations for linear and nonlinear systems respectively. This adaptive control method finds in real-time approximations of the optimal value and the NZS Nash-equilibrium, while also guaranteeing closed-loop stability. The optimal-adaptive algorithm is implemented as a separate actor/critic parametric network approximator structure for every player, and involves simultaneous continuous-time adaptation of the actor/critic networks. A persistence of excitation condition is shown to guarantee convergence of every critic to the actual optimal value function for that player. A detailed mathematical analysis is done for 2-player NZS games. Novel tuning algorithms are given for the actor/critic networks. The convergence to the Nash equilibrium is proven and stability of the system is also guaranteed. This provides optimal adaptive control solutions for both non-zero-sum games and their special case, the zero-sum games. Simulation examples show the effectiveness of the new algorithm.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0005-1098 1873-2836
DOI:	10.1016/j.automatica.2011.03.005