Dynamic SDN-Based Radio Access Network Slicing With Deep Reinforcement Learning for URLLC and eMBB Services

Radio access network (RAN) slicing is a key technology that enables 5G network to support heterogeneous requirements of generic services, namely ultra-reliable low-latency communication (URLLC) and enhanced mobile broadband (eMBB). In this paper, we propose a two time-scales RAN slicing mechanism to...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on network science and engineering Vol. 9; no. 4; pp. 2174 - 2187
Main Authors	Filali, Abderrahime, Mlika, Zoubeir, Cherkaoui, Soumaya, Kobbane, Abdellatif
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.07.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	5G mobile communication Algorithms Broadband Deep learning deep reinforcement learning eMBB Heuristic algorithms Machine learning Markov processes Mobile computing Multiagent systems Network latency Network slicing Optimization Quality of service Radio Radio access networks Resource allocation Resource management Software-defined networking Ultra reliable low latency communication URLLC
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Radio access network (RAN) slicing is a key technology that enables 5G network to support heterogeneous requirements of generic services, namely ultra-reliable low-latency communication (URLLC) and enhanced mobile broadband (eMBB). In this paper, we propose a two time-scales RAN slicing mechanism to optimize the performance of URLLC and eMBB services. In a large time-scale, an SDN controller allocates radio resources to gNodeBs according to the requirements of the eMBB and URLLC services. In a short time-scale, each gNodeB allocates its available resources to its end-users and requests, if needed, additional resources from adjacent gNodeBs. We formulate this problem as a non-linear binary program and prove its NP-hardness. Next, for each time-scale, we model the problem as a Markov decision process (MDP), where the large-time scale is modeled as a single agent MDP whereas the shorter time-scale is modeled as a multi-agent MDP. We leverage the exponential-weight algorithm for exploration and exploitation (EXP3) to solve the single-agent MDP of the large time-scale MDP and the multi-agent deep Q-learning (DQL) algorithm to solve the multi-agent MDP of the short time-scale resource allocation. Extensive simulations show that our approach is efficient under different network parameters configuration and it outperforms recent benchmark solutions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2327-4697 2334-329X
DOI:	10.1109/TNSE.2022.3157274