A multi-action deep reinforcement learning framework for flexible Job-shop scheduling problem

•An end-to-end DRL-based framework is introduced to solve the FJSP.•Multi-PPO is used to learn job operation action and machine action sub-policies in MPGN.•The proposed DRL shows its robustness via random and benchmark test instances. This paper presents an end-to-end deep reinforcement framework t...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 205; p. 117796
Main Authors Lei, Kun, Guo, Peng, Zhao, Wenchao, Wang, Yi, Qian, Linmao, Meng, Xiangyin, Tang, Liansheng
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.11.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•An end-to-end DRL-based framework is introduced to solve the FJSP.•Multi-PPO is used to learn job operation action and machine action sub-policies in MPGN.•The proposed DRL shows its robustness via random and benchmark test instances. This paper presents an end-to-end deep reinforcement framework to automatically learn a policy for solving a flexible Job-shop scheduling problem (FJSP) using a graph neural network. In the FJSP environment, the reinforcement agent needs to schedule an operation belonging to a job on an eligible machine among a set of compatible machines at each timestep. This means that an agent needs to control multiple actions simultaneously. Such a problem with multi-actions is formulated as a multiple Markov decision process (MMDP). For solving the MMDPs, we propose a multi-pointer graph networks (MPGN) architecture and a training algorithm called multi-Proximal Policy Optimization (multi-PPO) to learn two sub-policies, including a job operation action policy and a machine action policy to assign a job operation to a machine. The MPGN architecture consists of two encoder-decoder components, which define the job operation action policy and the machine action policy for predicting probability distributions over different operations and machines, respectively. We introduce a disjunctive graph representation of FJSP and use a graph neural network to embed the local state encountered during scheduling. The computational experiment results show that the agent can learn a high-quality dispatching policy and outperforms handcrafted heuristic dispatching rules in solution quality and meta-heuristic algorithm in running time. Moreover, the results achieved on random and benchmark instances demonstrate that the learned policies have a good generalization performance on real-world instances and significantly larger scale instances with up to 2000 operations.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.117796