Throughput Maximization With an AoI Constraint in Energy Harvesting D2D-Enabled Cellular Networks: An MSRA-TD3 Approach

The energy harvesting D2D-enabled cellular network (EH-DCN) has emerged as a promising approach to address the issues of energy supply and spectrum utilization. Most of existing works mainly focus on the throughput, while the information freshness, which is critical to the time-sensitive application...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on wireless communications Vol. 24; no. 2; pp. 1448 - 1466
Main Authors Liu, Xiaoying, Xu, Jiaxiang, Zheng, Kechen, Zhang, Guanglin, Liu, Jia, Shiratori, Norio
Format Journal Article
LanguageEnglish
Published New York IEEE 01.02.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The energy harvesting D2D-enabled cellular network (EH-DCN) has emerged as a promising approach to address the issues of energy supply and spectrum utilization. Most of existing works mainly focus on the throughput, while the information freshness, which is critical to the time-sensitive applications, has been rarely explored. Considering above facts, we aim to develop an optimal mode selection and resource allocation (MSRA) policy that maximizes the long-term overall throughput of a time-varying dynamic EH-DCN, subject to an age of information (AoI) constraint. As the MSRA policy involves both continuous variables (i.e., bandwidth, power, and time allocations) and discrete variables (i.e., mode selection and channel allocation), the optimization problem is proved to be nonconvex and NP-hard. To solve the nonconvex NP-hard problem, we exploit a deep reinforcement learning (DRL) approach, called MSRA twin delayed deep deterministic policy gradient (MSRA-TD3). The MSRA-TD3 employs a double critic network structure to better fit the reward function, and could effectively mitigate the overestimation of Q-value in deep deterministic policy gradient (DDPG), which is a classical DRL algorithm. It is worth noting that in the design of the MSRA-TD3, we use the throughput of user equipments (UEs) at the previous time slot as a state to bypass the channel state information estimation resulting from the time-varying dynamic environment, and take the weights of throughput and AoI penalty into the reward function to evaluate two performance. Simulations demonstrate that the established MSRA-TD3 algorithm achieves better performance in terms of throughput and AoI than comparison DRL algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1536-1276
1558-2248
DOI:10.1109/TWC.2024.3509475