Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning

This paper investigates a Q-learning scheme for the optimal consensus control of discrete-time multiagent systems. The Q-learning algorithm is conducted by reinforcement learning (RL) using system data instead of system dynamics information. In the multiagent systems, the agents are interacted with...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the Franklin Institute Vol. 356; no. 13; pp. 6946 - 6967
Main Authors	Mu, Chaoxu, Zhao, Qian, Gao, Zhongke, Sun, Changyin
Format	Journal Article
Language	English
Published	Elmsford Elsevier Ltd 01.09.2019 Elsevier Science Ltd
Subjects	Algorithms Computer simulation Control systems Discrete element method Discrete time systems Exact solutions Game theory Iterative methods Machine learning Mathematical models Multiagent systems Optimal control Performance indices Simulation Stability analysis Synchronism System dynamics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper investigates a Q-learning scheme for the optimal consensus control of discrete-time multiagent systems. The Q-learning algorithm is conducted by reinforcement learning (RL) using system data instead of system dynamics information. In the multiagent systems, the agents are interacted with each other and at least one agent can communicate with the leader directly, which is described by an algebraic graph structure. The objective is to make all the agents achieve synchronization with leader and make the performance indices reach Nash equilibrium. On one hand, the solutions of the optimal consensus control for multiagent systems are acquired by solving the coupled Hamilton–Jacobi–Bellman (HJB) equation. However, it is difficult to get analytical solutions directly of the discrete-time HJB equation. On the other hand, accurate mathematical models of most systems in real world are hard to be obtained. To overcome these difficulties, Q-learning algorithm is developed using system data rather than the accurate system model. We formulate performance index and corresponding Bellman equation of each agent i. Then, the Q-function Bellman equation is acquired on the basis of Q-function. Policy iteration is adopted to calculate the optimal control iteratively, and least square (LS) method is employed to motivate the implementation process. Stability analysis of proposed Q-learning algorithm for multiagent systems by policy iteration is given. Two simulation examples are experimented to verify the effectiveness of the proposed scheme.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0016-0032 1879-2693 0016-0032
DOI:	10.1016/j.jfranklin.2019.06.007