Deep-Reinforcement-Learning-Based Spectrum Resource Management for Industrial Internet of Things

The Industrial Internet of Things (IIoT) has attracted tremendous interest from both industry and academia as it can significantly improve production efficiency and system intelligence. However, with the explosive growth of various types of user equipment (UE) and data flow, IIoT experiences spectru...

Full description

Saved in:
Bibliographic Details
Published inIEEE internet of things journal Vol. 8; no. 5; pp. 3476 - 3489
Main Authors Shi, Zhaoyuan, Xie, Xianzhong, Lu, Huabing, Yang, Helin, Kadoch, Michel, Cheriet, Mohamed
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.03.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The Industrial Internet of Things (IIoT) has attracted tremendous interest from both industry and academia as it can significantly improve production efficiency and system intelligence. However, with the explosive growth of various types of user equipment (UE) and data flow, IIoT experiences spectrum resource scarcity for wireless applications. In this article, we propose a solution for spectrum resource management for the IIoT network, with the objective of facilitating the limited spectrum sharing between different kinds of UEs. To overcome the challenges of unknown dynamic IIoT environments, a modified deep <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning network (MDQN) is developed. Considering the cost effectiveness of IIoT devices, the base station (BS) acts as a single agent and centrally manages the spectrum resources, which can be executed without coordination or exchange between UEs. In this article, we first built a realistic IIoT model and design a simple medium access control (MAC) frame structure to facilitate the environment state observation. Then, a new reward function is designed to drive the learning process, which takes into account the different communication requirements of various types of UEs. In addition, to improve the learning efficiency, we compress the action space and propose a priority experience replay strategy based on decreasing temporal difference (TD) error. Finally, simulation results show that the proposed algorithm can successfully achieve dynamic spectrum resource management in the IIoT network. Compared with other algorithms, it can achieve superior network performance with a faster convergence rate.
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2020.3022861