Multi-Agent Reinforcement Learning Approach Scheduling for In-X Subnetworks

We consider radio resource scheduling in a network of multiple non-coordinated in-X subnetworks which move with respect to each other. Each subnetwork is controlled by an independent agent, scheduling resources to devices within the subnetwork. The only information about decisions of other agents is...

Full description

Saved in:

Bibliographic Details
Published in	IEEE Vehicular Technology Conference pp. 1 - 7
Main Authors	Srinivasan, Ashvin, Singh, Ugrasen, Tirkkonen, Olav
Format	Conference Proceeding
Language	English
Published	IEEE 07.10.2024
Subjects	Deep Deterministic Policy Gradient Dy- namic Spectrum Allocation Fading channels In-X subnetworks Interference Long short term memory Machine Learning Multi-agent Reinforcement Learning Non-Stationary MDP Reliability scheduling Search methods Signal to noise ratio Simulation Training URLLC Vehicular and wireless technologies Velocity measurement
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We consider radio resource scheduling in a network of multiple non-coordinated in-X subnetworks which move with respect to each other. Each subnetwork is controlled by an independent agent, scheduling resources to devices within the subnetwork. The only information about decisions of other agents is through interference measurements which are non-stationary due to subnetwork mobility and fast fading effects. The agents aim is to serve the devices in their subnetwork with a fixed data rate and a high reliability. The problem is cast as a multi-agent non-stationary Markov Decision Process (MDP), with unknown transition functions. We approach the problem via Multi-Agent Deep Reinforcement Learning (DRL), leveraging Long Short Term Memory (LSTM) networks to handle the non-stationarity and Deep Deterministic Policy Gradient (DDPG) to manage high-dimensional continuous action spaces. Candidate actions given by DRL are quantized to discrete actions by a novel binary tree search method subject to reliability constraints. Simulation results indicate that the proposed LSTM-based DRL scheduling strategy outperforms strategies based on Feed Forward Neural Networks, Centralized Training with Decentralized Execution approaches found in the literature, and conventional heuristic approaches.
ISSN:	2577-2465
DOI:	10.1109/VTC2024-Fall63153.2024.10757504