Exploiting Deep Reinforcement Learning for Stochastic AoI Minimization in Multi-UAV-assisted Wireless Networks

In this paper, we consider a multiple unmanned aerial vehicles (UAVs)-assisted wireless sensing network, where low-power ground users (GUs) periodically sense the environmental information and upload the recent sensing information to a base station (BS). The GUs firstly backscatter their information...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE Wireless Communications and Networking Conference (WCNC) pp. 1 - 6
Main Authors	Long, Yusi, Zhuang, Jialin, Gong, Shimin, Gu, Bo, Xu, Jing, Deng, Jing
Format	Conference Proceeding
Language	English
Published	IEEE 21.04.2024
Subjects	backscatter Deep reinforcement learning DRL Lyapunov optimization Minimization NOMA Sensors Stability analysis trajectory planning UAV Wireless networks Wireless sensor networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we consider a multiple unmanned aerial vehicles (UAVs)-assisted wireless sensing network, where low-power ground users (GUs) periodically sense the environmental information and upload the recent sensing information to a base station (BS). The GUs firstly backscatter their information to the UAVs and then the UAVs transmit the information to the BS by the non-orthogonal multiple access (NOMA) transmissions. Our goal is to minimize the long-term age-of-information (AoI) by jointly optimizing the UAV's sensing scheduling, transmission control, and trajectories. To solve this problem, we propose the Lyapunov-driven hierarchical proximal policy optimization framework, named Lya-HPPO, to decouple the multi-stage AoI minimization problem into several control subproblems. In each control subproblem, the UAVs' sensing scheduling and transmission control are firstly determined by the outer-loop deep reinforcement learning (DRL) approach, and then the inner-loop optimization module is to update the UAVs' trajectories. Simulation results verify that the proposed Lya-HPPO framework converges very fast to a stable value and can make online decisions in real time, while guaranteeing the long-term data buffer and AoI stability.
ISSN:	1558-2612
DOI:	10.1109/WCNC57260.2024.10570857