Multi-agent adaptive routing by multi-head-attention-based twin agents using reinforcement learning

A regular condition, typical for packet routing, for the problem of cargo transportation, and for the problem of flow control, is the variability of the graph. Reinforcement learning based adaptive routing algorithms are designed to solve the routing problem with this condition. However, with signif...

Full description

Saved in:
Bibliographic Details
Published inNauchno-tekhnicheskiĭ vestnik informat͡s︡ionnykh tekhnologiĭ, mekhaniki i optiki Vol. 22; no. 6; pp. 1178 - 1186
Main Authors Gribanov, T.A., Filchenkov, A.A., Azarov, A.A., Shalyto, A.A.
Format Journal Article
LanguageEnglish
Published Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 01.12.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A regular condition, typical for packet routing, for the problem of cargo transportation, and for the problem of flow control, is the variability of the graph. Reinforcement learning based adaptive routing algorithms are designed to solve the routing problem with this condition. However, with significant changes in the graph, the existing routing algorithms require complete retraining. To handle this challenge, we propose a novel method based on multi-agent modeling with twin-agents for which new neural network architecture with multi-headed internal attention is proposed, pre-trained within the framework of the multi-view learning paradigm. An agent in such a paradigm uses a vertex as an input, twins of the main agent are placed at the vertices of the graph and select a neighbor to which the object should be transferred. We carried out a comparative analysis with the existing DQN-LE-routing multi-agent routing algorithm on two stages: pre-training and simulation. In both cases, launches were considered by changing the topology during testing or simulation. Experiments have shown that the proposed adaptability enhancement method provides global adaptability by increasing delivery time only by 14.5 % after global changes occur. The proposed method can be used to solve routing problems with complex path evaluation functions and dynamically changing graph topologies, for example, in transport logistics and for managing conveyor belts in production.
ISSN:2226-1494
2500-0373
DOI:10.17586/2226-1494-2022-22-6-1178-1186