Throughput Maximization With an AoI Constraint in Energy Harvesting D2D-Enabled Cellular Networks: An MSRA-TD3 Approach
The energy harvesting D2D-enabled cellular network (EH-DCN) has emerged as a promising approach to address the issues of energy supply and spectrum utilization. Most of existing works mainly focus on the throughput, while the information freshness, which is critical to the time-sensitive application...
Saved in:
Published in | IEEE transactions on wireless communications Vol. 24; no. 2; pp. 1448 - 1466 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.02.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The energy harvesting D2D-enabled cellular network (EH-DCN) has emerged as a promising approach to address the issues of energy supply and spectrum utilization. Most of existing works mainly focus on the throughput, while the information freshness, which is critical to the time-sensitive applications, has been rarely explored. Considering above facts, we aim to develop an optimal mode selection and resource allocation (MSRA) policy that maximizes the long-term overall throughput of a time-varying dynamic EH-DCN, subject to an age of information (AoI) constraint. As the MSRA policy involves both continuous variables (i.e., bandwidth, power, and time allocations) and discrete variables (i.e., mode selection and channel allocation), the optimization problem is proved to be nonconvex and NP-hard. To solve the nonconvex NP-hard problem, we exploit a deep reinforcement learning (DRL) approach, called MSRA twin delayed deep deterministic policy gradient (MSRA-TD3). The MSRA-TD3 employs a double critic network structure to better fit the reward function, and could effectively mitigate the overestimation of Q-value in deep deterministic policy gradient (DDPG), which is a classical DRL algorithm. It is worth noting that in the design of the MSRA-TD3, we use the throughput of user equipments (UEs) at the previous time slot as a state to bypass the channel state information estimation resulting from the time-varying dynamic environment, and take the weights of throughput and AoI penalty into the reward function to evaluate two performance. Simulations demonstrate that the established MSRA-TD3 algorithm achieves better performance in terms of throughput and AoI than comparison DRL algorithms. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1536-1276 1558-2248 |
DOI: | 10.1109/TWC.2024.3509475 |