A multi-action deep reinforcement learning framework for flexible Job-shop scheduling problem

•An end-to-end DRL-based framework is introduced to solve the FJSP.•Multi-PPO is used to learn job operation action and machine action sub-policies in MPGN.•The proposed DRL shows its robustness via random and benchmark test instances. This paper presents an end-to-end deep reinforcement framework t...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 205; p. 117796
Main Authors	Lei, Kun, Guo, Peng, Zhao, Wenchao, Wang, Yi, Qian, Linmao, Meng, Xiangyin, Tang, Liansheng
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.11.2022
Subjects	Flexible job-shop scheduling problem Graph neural network Markov decision process Multi-action deep reinforcement learning Multi-proximal policy optimization Flexible job-shop scheduling problem Markov decision process Graph neural network Multi-proximal policy optimization Multi-action deep reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•An end-to-end DRL-based framework is introduced to solve the FJSP.•Multi-PPO is used to learn job operation action and machine action sub-policies in MPGN.•The proposed DRL shows its robustness via random and benchmark test instances. This paper presents an end-to-end deep reinforcement framework to automatically learn a policy for solving a flexible Job-shop scheduling problem (FJSP) using a graph neural network. In the FJSP environment, the reinforcement agent needs to schedule an operation belonging to a job on an eligible machine among a set of compatible machines at each timestep. This means that an agent needs to control multiple actions simultaneously. Such a problem with multi-actions is formulated as a multiple Markov decision process (MMDP). For solving the MMDPs, we propose a multi-pointer graph networks (MPGN) architecture and a training algorithm called multi-Proximal Policy Optimization (multi-PPO) to learn two sub-policies, including a job operation action policy and a machine action policy to assign a job operation to a machine. The MPGN architecture consists of two encoder-decoder components, which define the job operation action policy and the machine action policy for predicting probability distributions over different operations and machines, respectively. We introduce a disjunctive graph representation of FJSP and use a graph neural network to embed the local state encountered during scheduling. The computational experiment results show that the agent can learn a high-quality dispatching policy and outperforms handcrafted heuristic dispatching rules in solution quality and meta-heuristic algorithm in running time. Moreover, the results achieved on random and benchmark instances demonstrate that the learned policies have a good generalization performance on real-world instances and significantly larger scale instances with up to 2000 operations.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.117796