State-Dependent Parameter Tuning of the Apparent Tardiness Cost Dispatching Rule Using Deep Reinforcement Learning

The apparent tardiness cost (ATC) is a dispatching rule that demonstrates excellent performance in minimizing the total weighted tardiness (TWT) in single-machine scheduling. The ATC rule's performance is dependent on the lookahead parameter of an equation that calculates the job priority index...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 10; pp. 20187 - 20198
Main Authors	Min, Byungwook, Kim, Chang Ouk
Format	Journal Article
Language	English
Published	Piscataway IEEE 2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Deep deterministic policy gradient Deep learning Dispatching dispatching rule Dispatching rules Dynamic scheduling Heuristic algorithms Job shop scheduling Job shops Machine learning Parameter estimation parameter tuning Q-learning reinforcement learning Scheduling Single machine scheduling Tuning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The apparent tardiness cost (ATC) is a dispatching rule that demonstrates excellent performance in minimizing the total weighted tardiness (TWT) in single-machine scheduling. The ATC rule's performance is dependent on the lookahead parameter of an equation that calculates the job priority index. Existing studies recommend a fixed value or a value derived through a handcrafted function as an estimate of the lookahead parameter. However, such parameter estimation inevitably entails information loss from using summarized job data and generates an inferior schedule. This study proposes a reinforcement learning-based ATC dispatching rule that estimates the lookahead parameter directly from raw job data (processing time, weight, and slack time). The scheduling agent learns the relationship between raw job data and the continuous lookahead parameter while interacting with the scheduling environment using a deep deterministic policy gradient (DDPG) algorithm. We trained the DDPG model to minimize the TWT through a simulation in a single-machine scheduling problem with unequal job arrival times. Based on a preliminary experiment, we verified that the proposed dispatching rule, ATC-DDPG, successfully performed intelligent state-dependent parameter tuning. ATC-DDPG also displayed the best performance in the main experiment, which compared the performance with five existing dispatching rules.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2022.3152192