Hierarchical Hybrid Multi-Agent Deep Reinforcement Learning for Peer-to-Peer Energy Trading among Multiple Heterogeneous Microgrids

Peer-to-peer (P2P) energy trading among multi-microgrids has emerged as a promising paradigm to facilitate more efficient supply-demand balancing within local areas. However, existing works still exhibit limitations in terms of trading architecture and pricing schemes. In addition, the existing mult...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on smart grid Vol. 14; no. 6; p. 1
Main Authors	Wu, Yuxin, Zhao, Tianyang, Yan, Haoyuan, Liu, Min, Liu, Nian
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.11.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Deep learning Distributed energy resources Distributed generation hierarchical hybrid multi-agent deep reinforcement learning HVAC Microgrids Multiagent systems multiple heterogeneous microgrids Peer-to-peer computing peer-to-peer energy trading Power demand Pricing Reagents Reinforcement learning Supply & demand Task scheduling
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Peer-to-peer (P2P) energy trading among multi-microgrids has emerged as a promising paradigm to facilitate more efficient supply-demand balancing within local areas. However, existing works still exhibit limitations in terms of trading architecture and pricing schemes. In addition, the existing multi-agent deep reinforcement learning (MADRL) methods suffer from computational overload caused by the exploration of joint and hybrid action space during centralized training. In this paper, we propose a P2P energy trading paradigm based on hierarchical hybrid MADRL to maximize the trading profits among multiple heterogeneous MGs. First, we design a novel hierarchical structure of the MC agent to model the coupled interaction between flexible demands scheduling and autonomous quotation. Then, a P2P market that employs an improved mid-market rate (IMMR) pricing scheme is proposed to incentivize participation in local trading. Furthermore, to handle hybrid discrete-continuous action space and reduce computational complexity, we propose a hierarchical hybrid multi-agent double deep Q-network and deep deterministic policy gradient (hh-MADDQN-DDPG) algorithm to split the optimal policy learning-workload into a sequence of two sub-tasks. The DDQN for flexible demands scheduling and DDPG for energy trading. Numerical results of simulation I demonstrate that our hh-MADDQN-DDPG with IMMR increases 25% of the trading profits averaged over the baselines. Results of simulation II show that our hh-MADDQN-DDPG provides higher profits compared with the existing methods while maintaining better computational performance and scalability.
ISSN:	1949-3053 1949-3061
DOI:	10.1109/TSG.2023.3250321