Deep Reinforcement Learning-Based Hierarchical Time Division Duplexing Control for Dense Wireless and Mobile Networks

Future wireless and mobile network services must accommodate highly dynamic downlink and uplink traffic asymmetry. To fulfill this requirement, the third-generation partnership project (3GPP) introduced the enhanced interference mitigation and traffic adaptation strategy in addition to dynamic time...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on wireless communications Vol. 20; no. 11; pp. 7135 - 7150
Main Authors	Tuong, Van Dat, Dao, Nhu-Ngoc, Noh, Wonjong, Cho, Sungrae
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	5G mobile communication Algorithms Deep learning Deep Q-learning duplexing control Game theory Heuristic algorithms intercell interference Interference Machine learning Optimization Radio equipment radio frame configuration reinforcement learning Resource management stackelberg game Switches Time division Wireless communication Wireless networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Future wireless and mobile network services must accommodate highly dynamic downlink and uplink traffic asymmetry. To fulfill this requirement, the third-generation partnership project (3GPP) introduced the enhanced interference mitigation and traffic adaptation strategy in addition to dynamic time division duplexing (TDD). In this study, we develop a reinforcement learning (RL)-based dynamic TDD framework that effectively controls interference and serves various traffic demands. First, we introduce an interference-penalty model that evaluates interference indirectly based on the duplexing policy. This can significantly reduce overhead for measuring and exchanging channel information in a dense network. Second, we design a new mixed-reward model that consists of the achievable data rate and the implicit interference penalty. Third, we implement deep RL algorithms that base station (BSs) use to train their radio frame configurations (RFCs). The training process at each BS takes into account the traffic demand and the RFCs of the surrounding BSs. The BSs are coordinated in a single-leader multi-follower Stackelberg game, which achieves a global RFC setup that maximizes the data rate and minimizes the interference. Extensive simulations show that the proposed framework stably converges in various environments and provides near-optimal performance equivalent to 95% or more of the full-search-based optimal performance, which is 48.84%, 41.92%, and 62.11% higher than the currently utilized random RFC, fixed RFC, and traffic-matched RFC approaches.
ISSN:	1536-1276 1558-2248
DOI:	10.1109/TWC.2021.3080990