Dynamic replenishment policy for perishable goods using change point detection-based soft actor-critic reinforcement learning
This paper examines the problem of establishing a dynamic replenishment policy that minimizes the costs associated with selling perishable goods. The perishable inventory is highly desired to match the realized demand. However, the demand exhibits significant non-stationarity, which is characterized...
Saved in:
Published in | Expert systems with applications Vol. 270; p. 126556 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
25.04.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper examines the problem of establishing a dynamic replenishment policy that minimizes the costs associated with selling perishable goods. The perishable inventory is highly desired to match the realized demand. However, the demand exhibits significant non-stationarity, which is characterized by the dynamic change of stochastic demand distribution patterns. In this paper, the replenishment problem is modeled as a non-stationary Markov decision process (NSMDP) with unknown transition probabilities, and a deep reinforcement learning (DRL)-based solution framework is proposed for the NSMDP model. In this framework, the feature-enhanced long short-term memory (LSTM) is employed to detect change points in real time. On this basis, the paper develops a change point detection-based soft actor-critic (CPD-SAC) algorithm that dynamically adjusts replenishment decisions to adapt to different states across various stochastic demand distribution patterns. The numerical experiments first analyze the effect of sliding window selection on the accuracy of change point detection (CPD). Furthermore, the proposed approach is compared against several benchmark DRL algorithms and the static base stock policy. Finally, a sensitivity analysis is conducted on key parameters, including lead time, lifetime, and unit shortage cost for perishable goods. The results confirm the effectiveness of the proposed approach and demonstrate the applicability scenarios for the dynamic replenishment policy. |
---|---|
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2025.126556 |