Deep Deterministic Policy Gradient with Prioritized Sampling for Power Control

Reinforcement learning is a technique for power control in wireless communications. However, most research has focused on the deep Q-network (DQN) scheme, which outputs the Q-value for each discrete action, and does not match the continuous power control problem. Hence, this paper provides a deep de...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 8; p. 1
Main Authors	Zhou, Shiyang, Cheng, Yufan, Lei, Xia, Duan, Huanhuan
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.01.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Approximation Approximation algorithms Artificial neural networks Communication systems Computer simulation Decision making deep deterministic policy gradient Heuristic algorithms Interference Machine learning multiple sweep interference Neural networks Power control prioritized sampling Receivers reinforcement learning Sampling methods Training Wireless communications
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Reinforcement learning is a technique for power control in wireless communications. However, most research has focused on the deep Q-network (DQN) scheme, which outputs the Q-value for each discrete action, and does not match the continuous power control problem. Hence, this paper provides a deep deterministic policy gradient (DDPG) scheme for power control. A power selection policy designated an actor is approximated by a convolutional neural network (CNN), and an evaluation of a policy designated a critic is approximated by a fully connected network. These deep neural networks enable fast decision-making for large-scale power control problems. Moreover, to speed up the training process, this paper proposes a prioritized sampling technique, which samples the experiences that need to be learned with a higher probability. This paper simulates the proposed algorithm in a multiple sweep interference (MSI) scenario. The simulation results show that the DDPG scheme is more likely to achieve optimal policy than the DQN scheme. In addition, the DDPG scheme with prioritized sampling (DDPG-PS) converges faster than the DDPG scheme with uniform sampling.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2020.3033333