Deep Deterministic Policy Gradient with Prioritized Sampling for Power Control

Reinforcement learning is a technique for power control in wireless communications. However, most research has focused on the deep Q-network (DQN) scheme, which outputs the Q-value for each discrete action, and does not match the continuous power control problem. Hence, this paper provides a deep de...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 8; p. 1
Main Authors Zhou, Shiyang, Cheng, Yufan, Lei, Xia, Duan, Huanhuan
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Reinforcement learning is a technique for power control in wireless communications. However, most research has focused on the deep Q-network (DQN) scheme, which outputs the Q-value for each discrete action, and does not match the continuous power control problem. Hence, this paper provides a deep deterministic policy gradient (DDPG) scheme for power control. A power selection policy designated an actor is approximated by a convolutional neural network (CNN), and an evaluation of a policy designated a critic is approximated by a fully connected network. These deep neural networks enable fast decision-making for large-scale power control problems. Moreover, to speed up the training process, this paper proposes a prioritized sampling technique, which samples the experiences that need to be learned with a higher probability. This paper simulates the proposed algorithm in a multiple sweep interference (MSI) scenario. The simulation results show that the DDPG scheme is more likely to achieve optimal policy than the DQN scheme. In addition, the DDPG scheme with prioritized sampling (DDPG-PS) converges faster than the DDPG scheme with uniform sampling.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.3033333