Cooperative learning with joint state value approximation for multi-agent systems

This paper relieves the ＇curse of dimensionality＇ problem, which becomes intractable when scaling rein- forcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For coo...

Full description

Saved in:

Bibliographic Details
Published in	Journal of control theory and applications Vol. 11; no. 2; pp. 149 - 155
Main Authors	Chen, Xin, Chen, Gang, Cao, Weihua, Wu, Min
Format	Journal Article
Language	English
Published	Heidelberg South China University of Technology and Academy of Mathematics and Systems Science, CAS 01.05.2013 School of Information Science and Engineering, Central South University, Changsha Hunan 410083, China
Subjects	Agents (artificial intelligence) Algorithms Approximation Complexity Computational Intelligence Control Control and Systems Theory Engineering Expert systems Learning Mathematical analysis Mechatronics Optimization Q学习算法 Reinforcement Robotics Systems Theory 个人行为协作系统合作学习多代理系统多智能体系统学习速度状态 Q-learning Curse of dimensionality Decomposition Multi-agent system Cooperative system
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper relieves the ＇curse of dimensionality＇ problem, which becomes intractable when scaling rein- forcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For cooperative systems which widely exist in multi-agent systems, this paper proposes a new multi-agent Q-learning algorithm based on decomposing the joint state and joint action learning into two learning processes, which are learning individual action and the maximum value of the joint state approximately. The latter process considers others＇ actions to insure that the joint action is optimal and supports the updating of the former one. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with smaller memory and faster leamin~ soeed comoared with friend-O learnin~ and indet~endent learning.
Bibliography:	44-1600/TP This paper relieves the ＇curse of dimensionality＇ problem, which becomes intractable when scaling rein- forcement learning to multi-agent systems. This problem is aggravated exponentially as the number of agents increases, resulting in large memory requirement and slowness in learning speed. For cooperative systems which widely exist in multi-agent systems, this paper proposes a new multi-agent Q-learning algorithm based on decomposing the joint state and joint action learning into two learning processes, which are learning individual action and the maximum value of the joint state approximately. The latter process considers others＇ actions to insure that the joint action is optimal and supports the updating of the former one. The simulation results illustrate that the proposed algorithm can learn the optimal joint behavior with smaller memory and faster leamin~ soeed comoared with friend-O learnin~ and indet~endent learning. Multi-agent system; Q-learning; Cooperative system; Curse of dimensionality; Decomposition ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1672-6340 1993-0623
DOI:	10.1007/s11768-013-1141-z