Multi-Agent Reinforcement Learning Based 3D Trajectory Design in Aerial-Terrestrial Wireless Caching Networks

This paper investigates a dynamic 3D trajectory design of multiple cache-enabled unmanned aerial vehicles (UAVs) in a wireless device-to-device (D2D) caching network with the goal of maximizing the long-term network throughput. By storing popular content at the nearby mobile user devices, D2D cachin...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on vehicular technology Vol. 70; no. 8; pp. 8201 - 8215
Main Authors	Chen, Yu-Jia, Liao, Kai-Min, Ku, Meng-Lin, Tso, Fung Po, Chen, Guan-Yi
Format	Journal Article
Language	English
Published	New York IEEE 01.08.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Cache storage Caching Device-to-device communication Electronic devices Machine learning multi-agent reinforcement learning Multiagent systems Network topologies Nodes Storage capacity Throughput Trajectory trajectory design Trajectory optimization Unmanned aerial vehicles Unmanned aerial vehicles (UAVs) wireless caching Wireless communication Wireless networks Wireless sensor networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper investigates a dynamic 3D trajectory design of multiple cache-enabled unmanned aerial vehicles (UAVs) in a wireless device-to-device (D2D) caching network with the goal of maximizing the long-term network throughput. By storing popular content at the nearby mobile user devices, D2D caching is an efficient method to improve network throughput and alleviate backhaul burden. With the attractive features of high mobility and flexible deployment, UAVs have recently attracted significant attention as cache-enabled flying base stations. The use of cache-enabled UAVs opens up the possibility of tracking the mobility pattern of the corresponding users and serving them under limited cache storage capacity. However, it is challenging to determine the optimal UAV trajectory due to the dynamic environment with frequently changing network topology and the coexistence of aerial and terrestrial caching nodes. In response, we propose a novel multi-agent reinforcement learning based framework to determine the optimal 3D trajectory of each UAV in a distributed manner without a central coordinator. In the proposed method, multiple UAVs can cooperatively make flight decisions by sharing the gained experiences within a certain proximity to each other. Simulation results reveal that our algorithm outperforms the traditional single- and multi-agent Q-learning algorithms. This work confirms the feasibility and effectiveness of cache-enabled UAVs which serve as an important complement to terrestrial D2D caching nodes.
ISSN:	0018-9545 1939-9359
DOI:	10.1109/TVT.2021.3094273