Learning-Based Scheduling: Contextual Bandits for Massive MIMO Systems

Lately, Reinforcement Learning (RL) solutions appear as a tool with great potential to solve wireless communications problems. In this work, the scheduling problem in multiuser massive Multiple Input Multiple Output (MIMO) systems is investigated using RL-based techniques, wherein we propose a novel...

Full description

Saved in:
Bibliographic Details
Published in2020 IEEE International Conference on Communications Workshops (ICC Workshops) pp. 1 - 6
Main Authors Mauricio, Weskley V. F., Maciel, Tarcisio F., Klein, Anja, Lima, F. Rafael M.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Lately, Reinforcement Learning (RL) solutions appear as a tool with great potential to solve wireless communications problems. In this work, the scheduling problem in multiuser massive Multiple Input Multiple Output (MIMO) systems is investigated using RL-based techniques, wherein we propose a novel approach to multiuser scheduling in massive MIMO as a contextual bandit problem. The scheduler aims at maximizing the system throughput considering Quality of Service (QoS) constraints and multiple services. Firstly, we use the User Equipments (UEs)' spatial covariance matrices as the input of the K-means algorithm to split the UEs into spatially compatible clusters. Then, the scheduler defines each cluster as a virtual agent capable of making its own decision, which drastically reduces the search space. Lastly, the scheduler uses past information to learn how to satisfy the QoS requirements and maximize the system throughput. Our simulation results show that our solution outperforms a baseline algorithm obtaining 22.5% and 20% more throughput and system satisfaction, respectively. Furthermore, our solution also reduces the UEs' Channel State Information (CSI) feedback.
ISSN:2474-9133
DOI:10.1109/ICCWorkshops49005.2020.9145188