Enabling Efficient Scheduling in Large-Scale UAV-Assisted Mobile-Edge Computing via Hierarchical Reinforcement Learning

Due to the high maneuverability and flexibility, unmanned aerial vehicles (UAVs) have been considered as a promising paradigm to assist mobile edge computing (MEC) in many scenarios including disaster rescue and field operation. Most existing research focuses on the study of trajectory and computati...

Full description

Saved in:
Bibliographic Details
Published inIEEE internet of things journal Vol. 9; no. 10; pp. 7095 - 7109
Main Authors Ren, Tao, Niu, Jianwei, Dai, Bin, Liu, Xuefeng, Hu, Zheyuan, Xu, Mingliang, Guizani, Mohsen
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 15.05.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Due to the high maneuverability and flexibility, unmanned aerial vehicles (UAVs) have been considered as a promising paradigm to assist mobile edge computing (MEC) in many scenarios including disaster rescue and field operation. Most existing research focuses on the study of trajectory and computation-offloading scheduling for UAV-assisted MEC in stationary environments, and could face challenges in dynamic environments where the locations of UAVs and mobile devices (MDs) vary significantly. Some latest research attempts to develop scheduling policies for dynamic environments by means of reinforcement learning (RL). However, as these need to explore in high-dimensional state and action space, they may fail to cover in large-scale networks where multiple UAVs serve numerous MDs. To address this challenge, we leverage the idea of "divide-and-conquer" and propose HT3O, a scalable scheduling approach for large-scale UAV-assisted MEC. First, HT3O is built with neural networks via deep RL to obtain real-time scheduling policies for MEC in dynamic environments. More importantly, to make HT3O more scalable, we decompose the scheduling problem into two-layered subproblems and optimize them alternately via hierarchical RL. This not only substantially reduces the complexity of each subproblem, but also improves the convergence efficiency. Experimental results show that HT3O can achieve promising performance improvements over state-of-the-art approaches.
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2021.3071531