Deep Reinforcement Learning-Based Hierarchical Time Division Duplexing Control for Dense Wireless and Mobile Networks

Future wireless and mobile network services must accommodate highly dynamic downlink and uplink traffic asymmetry. To fulfill this requirement, the third-generation partnership project (3GPP) introduced the enhanced interference mitigation and traffic adaptation strategy in addition to dynamic time...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on wireless communications Vol. 20; no. 11; pp. 7135 - 7150
Main Authors Tuong, Van Dat, Dao, Nhu-Ngoc, Noh, Wonjong, Cho, Sungrae
Format Journal Article
LanguageEnglish
Published New York IEEE 01.11.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Future wireless and mobile network services must accommodate highly dynamic downlink and uplink traffic asymmetry. To fulfill this requirement, the third-generation partnership project (3GPP) introduced the enhanced interference mitigation and traffic adaptation strategy in addition to dynamic time division duplexing (TDD). In this study, we develop a reinforcement learning (RL)-based dynamic TDD framework that effectively controls interference and serves various traffic demands. First, we introduce an interference-penalty model that evaluates interference indirectly based on the duplexing policy. This can significantly reduce overhead for measuring and exchanging channel information in a dense network. Second, we design a new mixed-reward model that consists of the achievable data rate and the implicit interference penalty. Third, we implement deep RL algorithms that base station (BSs) use to train their radio frame configurations (RFCs). The training process at each BS takes into account the traffic demand and the RFCs of the surrounding BSs. The BSs are coordinated in a single-leader multi-follower Stackelberg game, which achieves a global RFC setup that maximizes the data rate and minimizes the interference. Extensive simulations show that the proposed framework stably converges in various environments and provides near-optimal performance equivalent to 95% or more of the full-search-based optimal performance, which is 48.84%, 41.92%, and 62.11% higher than the currently utilized random RFC, fixed RFC, and traffic-matched RFC approaches.
ISSN:1536-1276
1558-2248
DOI:10.1109/TWC.2021.3080990