Fast-DRD: Fast decentralized reinforcement distillation for deadline-aware edge computing

Edge computing has recently gained momentum as it provides computing services for mobile devices through high-speed networks. In edge computing system optimization, deep reinforcement learning(DRL) enhances the quality of services(QoS) and shorts the age of information(AoI). However, loosely coupled...

Full description

Saved in:
Bibliographic Details
Published inInformation processing & management Vol. 59; no. 2; p. 102850
Main Authors Song, Shinan, Fang, Zhiyi, Jiang, Jingyan
Format Journal Article
LanguageEnglish
Published Oxford Elsevier Ltd 01.03.2022
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Edge computing has recently gained momentum as it provides computing services for mobile devices through high-speed networks. In edge computing system optimization, deep reinforcement learning(DRL) enhances the quality of services(QoS) and shorts the age of information(AoI). However, loosely coupled edge servers saturate a noisy data space for DRL exploration, and learning a reasonable solution is enormously costly. Most existing works assume that the edge is an exact observation system and harvests well-labeled data for the pretraining of DRL neural networks. However, this assumption stands in opposition to the motivation of driving DRL to explore unknown information and increases the scheduling and computing costs in large-scale dynamic systems. This article leverages DRL with a distillation module to drive learning efficiency for edge computing with partial observation. We formulate the deadline-aware offloading problem as a decentralized partially observable Markov decision process (Dec-POMDP) with distillation, called fast decentralized reinforcement distillation(Fast-DRD). Each edge server decides makes offloading decisions in accordance with its own observations and learning strategies in a decentralized manner. By defining trajectory observation history(TOH) distillation and trust distillation to avoid overfitting, Fast-DRD learns a suitable offloading model in a noisy partially observed edge system and reduces the cost for communication among servers. Finally, experimental simulations are presented to evaluate and compare the effectiveness and complexity of Fast-DRD. •As far as we know, Fast-DRD is the first to investigate Dec-POMDP for modeling the deadline-aware offloading problem. Fast-DRD drives a distributed offloading and decentralized learning for loosely coupled edge servers with lower synchronize requirement, especially in unknown data space or poor communication with the central cloud.•Random exploration embodies non-iid data space and barriers to DRL efficiency in the edge. Cooperated with Dec-POMDP, we put forward the concept of trajectory observation history (TOH) as the basic distillation unit. TOH decomposes the optimization goal into ephemeral estimated rewards and accumulated real rewards for harvesting valuable knowledge and filtering out the noise in DRL.•We conduct simulation experiments for multi-server edge computing offloading. The result shows that, compared with naive Policy Distillation, Fast-DRD’s two-stage distillation dramatically reduces the amount of exchanging data, and the learning time and data interaction cost decrease nearly 90%. In a complex environment of heterogeneous users with partial observation, offloading models learned by decentralized learning in Fast-DRD still maintain offloading efficiency.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2021.102850