Dynamic replenishment policy for perishable goods using change point detection-based soft actor-critic reinforcement learning

This paper examines the problem of establishing a dynamic replenishment policy that minimizes the costs associated with selling perishable goods. The perishable inventory is highly desired to match the realized demand. However, the demand exhibits significant non-stationarity, which is characterized...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 270; p. 126556
Main Authors	Kou, Aiqing, Cheng, Yan, Huang, Xiangyu, Jin, Jing
Format	Journal Article
Language	English
Published	Elsevier Ltd 25.04.2025
Subjects	Change point detection Perishable goods Reinforcement learning Replenishment policy Soft actor-critic Soft actor-critic Perishable goods Replenishment policy Change point detection Reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper examines the problem of establishing a dynamic replenishment policy that minimizes the costs associated with selling perishable goods. The perishable inventory is highly desired to match the realized demand. However, the demand exhibits significant non-stationarity, which is characterized by the dynamic change of stochastic demand distribution patterns. In this paper, the replenishment problem is modeled as a non-stationary Markov decision process (NSMDP) with unknown transition probabilities, and a deep reinforcement learning (DRL)-based solution framework is proposed for the NSMDP model. In this framework, the feature-enhanced long short-term memory (LSTM) is employed to detect change points in real time. On this basis, the paper develops a change point detection-based soft actor-critic (CPD-SAC) algorithm that dynamically adjusts replenishment decisions to adapt to different states across various stochastic demand distribution patterns. The numerical experiments first analyze the effect of sliding window selection on the accuracy of change point detection (CPD). Furthermore, the proposed approach is compared against several benchmark DRL algorithms and the static base stock policy. Finally, a sensitivity analysis is conducted on key parameters, including lead time, lifetime, and unit shortage cost for perishable goods. The results confirm the effectiveness of the proposed approach and demonstrate the applicability scenarios for the dynamic replenishment policy.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2025.126556