Deep-Reinforcement-Learning-Based Spectrum Resource Management for Industrial Internet of Things
The Industrial Internet of Things (IIoT) has attracted tremendous interest from both industry and academia as it can significantly improve production efficiency and system intelligence. However, with the explosive growth of various types of user equipment (UE) and data flow, IIoT experiences spectru...
Saved in:
Published in | IEEE internet of things journal Vol. 8; no. 5; pp. 3476 - 3489 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.03.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The Industrial Internet of Things (IIoT) has attracted tremendous interest from both industry and academia as it can significantly improve production efficiency and system intelligence. However, with the explosive growth of various types of user equipment (UE) and data flow, IIoT experiences spectrum resource scarcity for wireless applications. In this article, we propose a solution for spectrum resource management for the IIoT network, with the objective of facilitating the limited spectrum sharing between different kinds of UEs. To overcome the challenges of unknown dynamic IIoT environments, a modified deep <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning network (MDQN) is developed. Considering the cost effectiveness of IIoT devices, the base station (BS) acts as a single agent and centrally manages the spectrum resources, which can be executed without coordination or exchange between UEs. In this article, we first built a realistic IIoT model and design a simple medium access control (MAC) frame structure to facilitate the environment state observation. Then, a new reward function is designed to drive the learning process, which takes into account the different communication requirements of various types of UEs. In addition, to improve the learning efficiency, we compress the action space and propose a priority experience replay strategy based on decreasing temporal difference (TD) error. Finally, simulation results show that the proposed algorithm can successfully achieve dynamic spectrum resource management in the IIoT network. Compared with other algorithms, it can achieve superior network performance with a faster convergence rate. |
---|---|
ISSN: | 2327-4662 2327-4662 |
DOI: | 10.1109/JIOT.2020.3022861 |