Hierarchical Hybrid Multi-Agent Deep Reinforcement Learning for Peer-to-Peer Energy Trading among Multiple Heterogeneous Microgrids

Peer-to-peer (P2P) energy trading among multi-microgrids has emerged as a promising paradigm to facilitate more efficient supply-demand balancing within local areas. However, existing works still exhibit limitations in terms of trading architecture and pricing schemes. In addition, the existing mult...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on smart grid Vol. 14; no. 6; p. 1
Main Authors Wu, Yuxin, Zhao, Tianyang, Yan, Haoyuan, Liu, Min, Liu, Nian
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.11.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Peer-to-peer (P2P) energy trading among multi-microgrids has emerged as a promising paradigm to facilitate more efficient supply-demand balancing within local areas. However, existing works still exhibit limitations in terms of trading architecture and pricing schemes. In addition, the existing multi-agent deep reinforcement learning (MADRL) methods suffer from computational overload caused by the exploration of joint and hybrid action space during centralized training. In this paper, we propose a P2P energy trading paradigm based on hierarchical hybrid MADRL to maximize the trading profits among multiple heterogeneous MGs. First, we design a novel hierarchical structure of the MC agent to model the coupled interaction between flexible demands scheduling and autonomous quotation. Then, a P2P market that employs an improved mid-market rate (IMMR) pricing scheme is proposed to incentivize participation in local trading. Furthermore, to handle hybrid discrete-continuous action space and reduce computational complexity, we propose a hierarchical hybrid multi-agent double deep Q-network and deep deterministic policy gradient (hh-MADDQN-DDPG) algorithm to split the optimal policy learning-workload into a sequence of two sub-tasks. The DDQN for flexible demands scheduling and DDPG for energy trading. Numerical results of simulation I demonstrate that our hh-MADDQN-DDPG with IMMR increases 25% of the trading profits averaged over the baselines. Results of simulation II show that our hh-MADDQN-DDPG provides higher profits compared with the existing methods while maintaining better computational performance and scalability.
ISSN:1949-3053
1949-3061
DOI:10.1109/TSG.2023.3250321