A dynamic multi-task selective execution policy considering stochastic dependence between degradation and random shocks by deep reinforcement learning

•A dynamic multi-task selective execution policy for multi-task missions.•The policy considers the stochastic dependence between degradation and shocks.•The dynamic decision-making model is constructed as a Markov decision process.•A deep reinforcement learning approach with action mask is tailored....

Full description

Saved in:
Bibliographic Details
Published inReliability engineering & system safety Vol. 257; p. 110844
Main Authors Liu, Lujie, Yang, Jun, Zheng, Huiling, Li, Lei, Wang, Ning
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.05.2025
Subjects
Online AccessGet full text
ISSN0951-8320
DOI10.1016/j.ress.2025.110844

Cover

Loading…
More Information
Summary:•A dynamic multi-task selective execution policy for multi-task missions.•The policy considers the stochastic dependence between degradation and shocks.•The dynamic decision-making model is constructed as a Markov decision process.•A deep reinforcement learning approach with action mask is tailored.•The action mask technique is used to prevent the repeated selection of tasks. To improve the efficiency of UAVs, it is common for a UAV to perform multiple tasks during each departure. However, existing mission abort policies primarily focus on scenarios where the system executes a single task and are not suitable to more complex multi-task missions. Moreover, in practice, degradation and random shocks often occur simultaneously, while existing studies typically only consider their separate effects on mission abort policies. To solve these problems, a multi-task selective execution policy considering the stochastic dependence between degradation and shocks is proposed to determine the next task for the system or the timing for mission abort. First, considering the health state of the system, location information, and the completion state of tasks, a multi-task selective execution policy is proposed. Next, to maximize the cumulative reward of the system, the corresponding sequential decision problem is formulated as a Markov Decision Process. Then, to address the dimensionality curse of continuous state space, a solution method based on deep reinforcement learning algorithms is tailored, incorporating an action masking technique to avoid repeated selection of already executed tasks. Finally, the effectiveness of the proposed method is verified through a numerical study using a UAV for multiple reconnaissance tasks.
ISSN:0951-8320
DOI:10.1016/j.ress.2025.110844