Attentional Factorized Q-Learning for Many-Agent Learning

The difficulty of Multi-Agent Reinforcement Learning (MARL) increases with the growing number of agents in system. The value function decomposition is an effective way to alleviate the curse of dimension. However, the existing methods usually either can only provide the low-order approximate decompo...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 10; p. 1
Main Authors	Wang, Xiaoqiang, Ke, Liangjun, Fu, Qiang
Format	Journal Article
Language	English
Published	Piscataway IEEE 2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Decomposition Games Low-Rank Machine learning Many-Agent Learning Markov processes Matrix decomposition Multiagent systems Performance gain Q-learning Reinforcement Learning Task analysis Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The difficulty of Multi-Agent Reinforcement Learning (MARL) increases with the growing number of agents in system. The value function decomposition is an effective way to alleviate the curse of dimension. However, the existing methods usually either can only provide the low-order approximate decomposition of no more than the second-order, or need to spend a lot of effort manually designing the high-order interaction among agents according to experience. Therefore, the existing methods either tend to bear large decomposition error or are not convenient to use. In this paper, a high-order approximate value function decomposition method is proposed, which has the following prominent characteristics: the low-rank vector is exploited to represent value function, the low-order and high-order components share the same input (i.e., the embedding vector), the attention mechanism is used to select the agents participating in the high-order interaction, and all agents share the model parameters if the agents are homogeneous. To our knowledge, this is the first MARL method modeling low- and high-order interaction simultaneously among agents that can be trained end-to-end. Extensive experiments on two different multi-agent problems demonstrate the performance gain of our proposed approach in comparison with strong baselines, particularly when there are a large number of agents.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2022.3214481