Resource Pricing and Allocation in MEC Enabled Blockchain Systems: An A3C Deep Reinforcement Learning Approach

When using blockchain in mobile systems, computation intensive mining tasks pose great challenges to the processing capabilities of mobile miner equipment. Mobile edge computing (MEC) is an effective solution to alleviating the problem via task offloading. In the mining process, miners compete for r...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on network science and engineering Vol. 9; no. 1; pp. 33 - 44
Main Authors Du, Jianbo, Cheng, Wenjie, Lu, Guangyue, Cao, Haotong, Chu, Xiaoli, Zhang, Zhicai, Wang, Junxuan
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.01.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:When using blockchain in mobile systems, computation intensive mining tasks pose great challenges to the processing capabilities of mobile miner equipment. Mobile edge computing (MEC) is an effective solution to alleviating the problem via task offloading. In the mining process, miners compete for rewards through puzzle solving, where only the miner that first completes the process will be rewarded. Thus, miners may wish to pay higher price and use more communication resources in task offloading and more computation resources in task processing for latency reduction. However, there are risks for the miners not profiting from consuming more resources or paying a higher price, so miners are rational in blockchain systems. In order to maximize the rational total profit of all miners, we use an asynchronous advantage actor-critic (A3C) deep reinforcement learning algorithm to obtain the resource pricing and allocation, considering the stochastic properties of wireless channels, and the prospect theory is employed to strike a good balance between risks and rewards. Numerical results show that our proposed A3C based joint optimization algorithm converges fast and outperforms the baseline algorithms in terms of the total reward.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2327-4697
2334-329X
DOI:10.1109/TNSE.2021.3068340