Trajectory Design for UAV-Based Internet of Things Data Collection: A Deep Reinforcement Learning Approach

In this article, we investigate an unmanned aerial vehicle (UAV)-assisted Internet of Things (IoT) system in a sophisticated 3-D environment, where the UAV's trajectory is optimized to efficiently collect data from multiple IoT ground nodes. Unlike existing approaches focusing only on a simplif...

Full description

Saved in:

Bibliographic Details
Published in	IEEE internet of things journal Vol. 9; no. 5; pp. 3899 - 3912
Main Authors	Wang, Yang, Gao, Zhen, Zhang, Jun, Cao, Xianbin, Zheng, Dezhi, Gao, Yue, Ng, Derrick Wing Kwan, Renzo, Marco Di
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.03.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Completion time Data collection Deep learning deep reinforcement learning (DRL) Engineering Sciences Internet of Things Internet of Things (IoT) Machine learning Markov processes Minimization Nodes Optimization Resource management Sensors Signal and Image processing Three-dimensional displays Trajectory trajectory design Trajectory optimization unmanned aerial vehicle (UAV) communications Unmanned aerial vehicles Urban environments Internet-of-Things (IoT) trajectory design deep reinforcement learning UAV communications data collection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this article, we investigate an unmanned aerial vehicle (UAV)-assisted Internet of Things (IoT) system in a sophisticated 3-D environment, where the UAV's trajectory is optimized to efficiently collect data from multiple IoT ground nodes. Unlike existing approaches focusing only on a simplified 2-D scenario and the availability of perfect channel state information (CSI), this article considers a practical 3-D urban environment with imperfect CSI, where the UAV's trajectory is designed to minimize data collection completion time subject to practical throughput and flight movement constraints. Specifically, inspired by the state-of-the-art deep reinforcement learning approaches, we leverage the twin-delayed deep deterministic policy gradient (TD3) to design the UAV's trajectory and we present a TD3-based trajectory design for completion time minimization (TD3-TDCTM) algorithm. In particular, we set an additional information, i.e., the merged pheromone, to represent the state information of the UAV and environment as a reference of reward which facilitates the algorithm design. By taking the service statuses of the IoT nodes, the UAV's position, and the merged pheromone as input, the proposed algorithm can continuously and adaptively learn how to adjust the UAV's movement strategy. By interacting with the external environment in the corresponding Markov decision process, the proposed algorithm can achieve a near-optimal navigation strategy. Our simulation results show the superiority of the proposed TD3-TDCTM algorithm over three conventional nonlearning-based baseline methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2327-4662 2372-2541 2327-4662
DOI:	10.1109/JIOT.2021.3102185