Hierarchical Hybrid Multi-Agent Deep Reinforcement Learning for Peer-to-Peer Energy Trading among Multiple Heterogeneous Microgrids
Peer-to-peer (P2P) energy trading among multi-microgrids has emerged as a promising paradigm to facilitate more efficient supply-demand balancing within local areas. However, existing works still exhibit limitations in terms of trading architecture and pricing schemes. In addition, the existing mult...
Saved in:
Published in | IEEE transactions on smart grid Vol. 14; no. 6; p. 1 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.11.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Peer-to-peer (P2P) energy trading among multi-microgrids has emerged as a promising paradigm to facilitate more efficient supply-demand balancing within local areas. However, existing works still exhibit limitations in terms of trading architecture and pricing schemes. In addition, the existing multi-agent deep reinforcement learning (MADRL) methods suffer from computational overload caused by the exploration of joint and hybrid action space during centralized training. In this paper, we propose a P2P energy trading paradigm based on hierarchical hybrid MADRL to maximize the trading profits among multiple heterogeneous MGs. First, we design a novel hierarchical structure of the MC agent to model the coupled interaction between flexible demands scheduling and autonomous quotation. Then, a P2P market that employs an improved mid-market rate (IMMR) pricing scheme is proposed to incentivize participation in local trading. Furthermore, to handle hybrid discrete-continuous action space and reduce computational complexity, we propose a hierarchical hybrid multi-agent double deep Q-network and deep deterministic policy gradient (hh-MADDQN-DDPG) algorithm to split the optimal policy learning-workload into a sequence of two sub-tasks. The DDQN for flexible demands scheduling and DDPG for energy trading. Numerical results of simulation I demonstrate that our hh-MADDQN-DDPG with IMMR increases 25% of the trading profits averaged over the baselines. Results of simulation II show that our hh-MADDQN-DDPG provides higher profits compared with the existing methods while maintaining better computational performance and scalability. |
---|---|
ISSN: | 1949-3053 1949-3061 |
DOI: | 10.1109/TSG.2023.3250321 |