Attentional Factorized Q-Learning for Many-Agent Learning

The difficulty of Multi-Agent Reinforcement Learning (MARL) increases with the growing number of agents in system. The value function decomposition is an effective way to alleviate the curse of dimension. However, the existing methods usually either can only provide the low-order approximate decompo...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 10; p. 1
Main Authors Wang, Xiaoqiang, Ke, Liangjun, Fu, Qiang
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The difficulty of Multi-Agent Reinforcement Learning (MARL) increases with the growing number of agents in system. The value function decomposition is an effective way to alleviate the curse of dimension. However, the existing methods usually either can only provide the low-order approximate decomposition of no more than the second-order, or need to spend a lot of effort manually designing the high-order interaction among agents according to experience. Therefore, the existing methods either tend to bear large decomposition error or are not convenient to use. In this paper, a high-order approximate value function decomposition method is proposed, which has the following prominent characteristics: the low-rank vector is exploited to represent value function, the low-order and high-order components share the same input (i.e., the embedding vector), the attention mechanism is used to select the agents participating in the high-order interaction, and all agents share the model parameters if the agents are homogeneous. To our knowledge, this is the first MARL method modeling low- and high-order interaction simultaneously among agents that can be trained end-to-end. Extensive experiments on two different multi-agent problems demonstrate the performance gain of our proposed approach in comparison with strong baselines, particularly when there are a large number of agents.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2022.3214481