Optimal Dynamic State‐Dependent Maintenance Policy by Deep Reinforcement Learning

In this paper, we propose a new maintenance strategy considering “do nothing”, “imperfect repair”, and “replace” as alternative actions on a deteriorating system. The system is subject to random shocks that accelerate degradation. Unlike most existing works regarding maintenance with imperfect repai...

Full description

Saved in:

Bibliographic Details
Published in	Quality and reliability engineering international Vol. 41; no. 6; pp. 2715 - 2728
Main Authors	Eidi, Shaghayegh, Haghighi, Firoozeh, Safari, Abdollah, Zio, Enrico
Format	Journal Article
Language	English
Published	Bognor Regis Wiley Subscription Services, Inc 01.10.2025
Subjects	Deep learning Degradation Machine learning Maintenance Markov processes Optimization Repair Sensitivity analysis
Online Access	Get full text
ISSN	0748-8017 1099-1638
DOI	10.1002/qre.3806

Cover

Loading…

Abstract	In this paper, we propose a new maintenance strategy considering “do nothing”, “imperfect repair”, and “replace” as alternative actions on a deteriorating system. The system is subject to random shocks that accelerate degradation. Unlike most existing works regarding maintenance with imperfect repair actions, we propose a dynamic improvement factor that changes according to the state of the system at maintenance time. The proposed improvement factor is considered to have a random rejuvenating effect on the system, which reduces its degradation level (state) by reducing age. Such degradation state‐dependent improvement factor is more realistic than a fixed or random one, since the amount of improvement (rejuvenation) and the cost associated with maintenance are proportional to the system needs as described by the degradation levels. A Markov decision process is formulated to model the maintenance problem with a continuous state space and a Deep Reinforcement Learning algorithm is used to optimize the maintenance policy where the decision maker is trained by a Deep Q‐network. Central to this study is the comparison of three distinct models: a state‐independent improvement factor (Model I) versus two state‐dependent ones (Models II and III) with deterministic and stochastic repair effects, respectively. Through numerical and illustrative examples, this comparison underscores the importance of selecting the appropriate model when system condition data are available, demonstrating that state‐dependent models outperform their state‐independent counterparts in terms of cost‐efficiency and effectiveness. A sensitivity analysis is also conducted to examine the influence of the model's parameters on model selection.
AbstractList	In this paper, we propose a new maintenance strategy considering “do nothing”, “imperfect repair”, and “replace” as alternative actions on a deteriorating system. The system is subject to random shocks that accelerate degradation. Unlike most existing works regarding maintenance with imperfect repair actions, we propose a dynamic improvement factor that changes according to the state of the system at maintenance time. The proposed improvement factor is considered to have a random rejuvenating effect on the system, which reduces its degradation level (state) by reducing age. Such degradation state‐dependent improvement factor is more realistic than a fixed or random one, since the amount of improvement (rejuvenation) and the cost associated with maintenance are proportional to the system needs as described by the degradation levels. A Markov decision process is formulated to model the maintenance problem with a continuous state space and a Deep Reinforcement Learning algorithm is used to optimize the maintenance policy where the decision maker is trained by a Deep Q‐network. Central to this study is the comparison of three distinct models: a state‐independent improvement factor (Model I) versus two state‐dependent ones (Models II and III) with deterministic and stochastic repair effects, respectively. Through numerical and illustrative examples, this comparison underscores the importance of selecting the appropriate model when system condition data are available, demonstrating that state‐dependent models outperform their state‐independent counterparts in terms of cost‐efficiency and effectiveness. A sensitivity analysis is also conducted to examine the influence of the model's parameters on model selection. In this paper, we propose a new maintenance strategy considering “do nothing”, “imperfect repair”, and “replace” as alternative actions on a deteriorating system. The system is subject to random shocks that accelerate degradation. Unlike most existing works regarding maintenance with imperfect repair actions, we propose a dynamic improvement factor that changes according to the state of the system at maintenance time. The proposed improvement factor is considered to have a random rejuvenating effect on the system, which reduces its degradation level (state) by reducing age. Such degradation state‐dependent improvement factor is more realistic than a fixed or random one, since the amount of improvement (rejuvenation) and the cost associated with maintenance are proportional to the system needs as described by the degradation levels. A Markov decision process is formulated to model the maintenance problem with a continuous state space and a Deep Reinforcement Learning algorithm is used to optimize the maintenance policy where the decision maker is trained by a Deep Q‐network. Central to this study is the comparison of three distinct models: a state‐independent improvement factor (Model I) versus two state‐dependent ones (Models II and III) with deterministic and stochastic repair effects, respectively. Through numerical and illustrative examples, this comparison underscores the importance of selecting the appropriate model when system condition data are available, demonstrating that state‐dependent models outperform their state‐independent counterparts in terms of cost‐efficiency and effectiveness. A sensitivity analysis is also conducted to examine the influence of the model's parameters on model selection.
Author	Eidi, Shaghayegh Haghighi, Firoozeh Safari, Abdollah Zio, Enrico
Author_xml	– sequence: 1 givenname: Shaghayegh surname: Eidi fullname: Eidi, Shaghayegh organization: School of Mathematics Statistics and Computer Science College of Science University of Tehran Tehran Iran – sequence: 2 givenname: Firoozeh orcidid: 0000-0003-1880-937X surname: Haghighi fullname: Haghighi, Firoozeh organization: School of Mathematics Statistics and Computer Science College of Science University of Tehran Tehran Iran – sequence: 3 givenname: Abdollah surname: Safari fullname: Safari, Abdollah organization: School of Mathematics Statistics and Computer Science College of Science University of Tehran Tehran Iran – sequence: 4 givenname: Enrico surname: Zio fullname: Zio, Enrico organization: Center for Research on Risks and Crises (CRC) Mines Paris‐PSL University Paris France, Energy Department Politecnico di Milano Milan Italy
BookMark	eNotkEtOwzAURS1UJNqCxBIsMWGS8vxJYg9Ry08qKqIwthznBaVqndRxB5mxBNbISkhVRndydK_umZCRbzwScs1gxgD43T7gTCjIzsiYgdYJy4QakTHkUiUKWH5BJl23ARhgrcZkvWpjvbNbuui93dWOrqON-Pv9s8AWfYk-0ldb-4jeeof0rdnWrqdFTxeILX3H2ldNcLg7gku0wdf-65KcV3bb4dV_Tsnn48PH_DlZrp5e5vfLxHGmYuJ0oTVK1LwSmYBCS-GUlUrarJKVy9MCQOTcArdW8jxVriwcQlnqSivFczElN6feNjT7A3bRbJpD8MOkEVymnIlcyoG6PVEuNF0XsDJtGB6H3jAwR2VmUGaOysQfUJthLQ
Cites_doi	10.1016/S0951-8320(03)00173-X 10.1080/05695557908974463 10.1016/j.ress.2021.107592 10.1016/j.ress.2018.05.002 10.1109/TR.2022.3197322 10.1016/j.ress.2017.05.004 10.1016/j.ress.2021.107905 10.1016/j.renene.2021.11.052 10.1016/j.ress.2014.08.011 10.1016/j.ejor.2015.02.050 10.1016/j.cie.2021.107298 10.1109/TR.2011.2167779 10.1016/j.ress.2020.106994 10.1016/j.cie.2016.10.008 10.1016/j.ress.2017.03.015 10.1080/08982112.2021.1977950 10.1016/0377-2217(92)90309-W 10.1016/j.ejor.2018.12.029 10.1002/qre.1431 10.1016/j.ress.2022.108613
ContentType	Journal Article
Copyright	2025 John Wiley & Sons Ltd.
Copyright_xml	– notice: 2025 John Wiley & Sons Ltd.
DBID	AAYXX CITATION 7TB 8FD FR3
DOI	10.1002/qre.3806
DatabaseName	CrossRef Mechanical & Transportation Engineering Abstracts Technology Research Database Engineering Research Database
DatabaseTitle	CrossRef Technology Research Database Mechanical & Transportation Engineering Abstracts Engineering Research Database
DatabaseTitleList	CrossRef Technology Research Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1099-1638
EndPage	2728
ExternalDocumentID	10_1002_qre_3806
GroupedDBID	.3N .GA .Y3 05W 0R~ 10A 123 1L6 1OB 1OC 31~ 33P 3SF 3WU 4.4 50Y 50Z 51W 51X 52M 52N 52O 52P 52S 52T 52U 52W 52X 5VS 66C 702 7PT 8-0 8-1 8-3 8-4 8-5 8UM 8WZ 930 A03 A6W AAESR AAEVG AAHQN AAMMB AAMNL AANHP AANLZ AAONW AASGY AAXRX AAYCA AAYXX AAZKR ABCQN ABCUV ABEML ABIJN ABJNI ABPVW ACAHQ ACBWZ ACCZN ACGFS ACIWK ACPOU ACRPL ACSCC ACXBN ACXQS ACYXJ ADBBV ADEOM ADIZJ ADMGS ADMLS ADNMO ADOZA ADXAS AEFGJ AEIGN AEIMD AENEX AEUYR AEYWJ AFBPY AFFNX AFFPM AFGKR AFWVQ AFZJQ AGHNM AGQPQ AGXDD AGYGG AHBTC AIDQK AIDYY AITYG AIURR AJXKR ALAGY ALMA_UNASSIGNED_HOLDINGS ALUQN ALVPJ AMBMR AMYDB ASPBG ATUGU AUFTA AVWKF AZBYB AZFZN AZVAB BAFTC BDRZF BFHJK BHBCM BMNLL BMXJE BNHUX BROTX BRXPI BY8 CITATION CMOOK CS3 D-E D-F DCZOG DPXWK DR2 DRFUL DRSTM DU5 EBS EJD F00 F01 F04 FEDTE G-S G.N GNP GODZA H.T H.X HBH HF~ HGLYW HHY HVGLF HZ~ IX1 J0M JPC KQQ LATKE LAW LC2 LC3 LEEKS LH4 LITHE LOXES LP6 LP7 LUTES LW6 LYRES M59 MEWTI MK4 MRFUL MRSTM MSFUL MSSTM MXFUL MXSTM N04 N05 N9A NF~ NNB O66 O9- P2P P2W P2X P4D PALCI Q.N Q11 QB0 QRW R.K RIWAO RJQFR RNS ROL RX1 RYL SAMSI SUPJJ TN5 UB1 V2E W8V W99 WBKPD WH7 WIH WIK WLBEL WOHZO WQJ WXSBR WYISQ XG1 XPP XV2 ZZTAW ~IA ~WT 7TB 8FD FR3
ID	FETCH-LOGICAL-c218t-c9b99e4e92f3630b943c8a484a6f4fc75b00372a02aa42758cdbce0dd9f988273
ISSN	0748-8017
IngestDate	Sun Aug 31 08:18:03 EDT 2025 Wed Sep 03 16:44:44 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	6
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c218t-c9b99e4e92f3630b943c8a484a6f4fc75b00372a02aa42758cdbce0dd9f988273
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0003-1880-937X
PQID	3245213744
PQPubID	1016437
PageCount	14
ParticipantIDs	proquest_journals_3245213744 crossref_primary_10_1002_qre_3806
PublicationCentury	2000
PublicationDate	20251001
PublicationDateYYYYMMDD	2025-10-01
PublicationDate_xml	– month: 10 year: 2025 text: 20251001 day: 01
PublicationDecade	2020
PublicationPlace	Bognor Regis
PublicationPlace_xml	– name: Bognor Regis
PublicationTitle	Quality and reliability engineering international
PublicationYear	2025
Publisher	Wiley Subscription Services, Inc
Publisher_xml	– name: Wiley Subscription Services, Inc
References	Zhang N. (e_1_2_8_17_1) 2020; 34 e_1_2_8_18_1 e_1_2_8_19_1 e_1_2_8_13_1 e_1_2_8_14_1 e_1_2_8_15_1 e_1_2_8_16_1 e_1_2_8_3_1 e_1_2_8_2_1 e_1_2_8_5_1 e_1_2_8_4_1 e_1_2_8_7_1 e_1_2_8_6_1 e_1_2_8_9_1 e_1_2_8_8_1 e_1_2_8_20_1 e_1_2_8_10_1 e_1_2_8_11_1 e_1_2_8_22_1 e_1_2_8_12_1 e_1_2_8_23_1 Meeker W. (e_1_2_8_21_1) 2022
References_xml	– ident: e_1_2_8_4_1 doi: 10.1016/S0951-8320(03)00173-X – ident: e_1_2_8_3_1 doi: 10.1080/05695557908974463 – ident: e_1_2_8_7_1 doi: 10.1016/j.ress.2021.107592 – ident: e_1_2_8_8_1 doi: 10.1016/j.ress.2018.05.002 – ident: e_1_2_8_20_1 doi: 10.1109/TR.2022.3197322 – ident: e_1_2_8_23_1 doi: 10.1016/j.ress.2017.05.004 – ident: e_1_2_8_6_1 doi: 10.1016/j.ress.2021.107905 – ident: e_1_2_8_18_1 doi: 10.1016/j.renene.2021.11.052 – ident: e_1_2_8_14_1 doi: 10.1016/j.ress.2014.08.011 – ident: e_1_2_8_12_1 doi: 10.1016/j.ejor.2015.02.050 – ident: e_1_2_8_22_1 doi: 10.1016/j.cie.2021.107298 – ident: e_1_2_8_13_1 doi: 10.1109/TR.2011.2167779 – volume: 34 start-page: 16 issue: 1 year: 2020 ident: e_1_2_8_17_1 article-title: Deep Reinforcement Learning for Condition‐Based Maintenance Planning of Multi‐Component Systems Under Dependent Competing Risks publication-title: Reliability Engineering and System Safety – ident: e_1_2_8_11_1 doi: 10.1016/j.ress.2020.106994 – ident: e_1_2_8_10_1 doi: 10.1016/j.cie.2016.10.008 – ident: e_1_2_8_5_1 doi: 10.1016/j.ress.2017.03.015 – ident: e_1_2_8_16_1 doi: 10.1080/08982112.2021.1977950 – ident: e_1_2_8_2_1 doi: 10.1016/0377-2217(92)90309-W – ident: e_1_2_8_9_1 doi: 10.1016/j.ejor.2018.12.029 – ident: e_1_2_8_15_1 doi: 10.1002/qre.1431 – volume-title: Statistical Methods for Reliability Data year: 2022 ident: e_1_2_8_21_1 – ident: e_1_2_8_19_1 doi: 10.1016/j.ress.2022.108613
SSID	ssj0010098
Score	2.3933525
Snippet	In this paper, we propose a new maintenance strategy considering “do nothing”, “imperfect repair”, and “replace” as alternative actions on a deteriorating...
SourceID	proquest crossref
SourceType	Aggregation Database Index Database
StartPage	2715
SubjectTerms	Deep learning Degradation Machine learning Maintenance Markov processes Optimization Repair Sensitivity analysis
Title	Optimal Dynamic State‐Dependent Maintenance Policy by Deep Reinforcement Learning
URI	https://www.proquest.com/docview/3245213744
Volume	41
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ3NbtNAEMdXoVzggPgUhYIWiRtysXfXsX2sSKIKlVYiiRRxsfbLSSRIS3AP6YlH4DV4LZ6EmV1744gKFS5W5JWsaOe_M7Ormd8S8lppK5Tts4hzbSKRGB4pFWeRivPKmsIqnWDv8IfT_vFUvJ-ls17vZ6dq6bJWh_rq2r6S_7EqvAO7YpfsP1g2fBRewG-wLzzBwvC8kY3PYL1_Qb_lr5X3mWMoXxg099vWWEiBlequO8CDgDHrHFh7AdPr0KnanRK2tNV5N2X1lA2PaVrbz0sP9t68sVuSoYNOhHPFkKQvjSsVGC_kfCE3dr7Yurs5YpLd6GgJufuVDWNjWUnf_X6kDIo0jHxaumPd4Qpc93n3tIKloe7tpj6x4_8ykWMA9fHYev-MQFFMIbsO3JOzGqHueOPMt4o2kZ1lvg_9j6jhKbRf1_aQ5_E1YO7Ts3I0PTkpJ8PZ5Ba5zWBHgpdlDD4GUlmCXFZPfPX_ueUcx-xt-93dzGc38LtsZnKf3Gu2IfTIa-oB6dnVQ3K3A6d8RMaNumijLurU9ev7j6Ar2tEV9bqiakNRV3RHV7TV1WMyHQ0n746j5gqOSEPuV0e6UEVhhS1Yxfs8VoXgOpciF7JfiUpnKUaFjMmYSSkY7D21gdUfG1NUBezdMv6E7K3OV_YpoTluBfLExmkqhc6E1BKcf2FNzA1C7fbJq3Z6ygtPWik9U5uVMIUlTuE-OWjnrWzW4beSY_FAwjMhnv19-Dm5s5XkAdmr15f2BaSUtXrpjPkbpkZ97Q
linkProvider	Wiley-Blackwell
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimal+Dynamic+State%E2%80%90Dependent+Maintenance+Policy+by+Deep+Reinforcement+Learning&rft.jtitle=Quality+and+reliability+engineering+international&rft.au=Eidi%2C+Shaghayegh&rft.au=Haghighi%2C+Firoozeh&rft.au=Safari%2C+Abdollah&rft.au=Zio%2C+Enrico&rft.date=2025-10-01&rft.pub=Wiley+Subscription+Services%2C+Inc&rft.issn=0748-8017&rft.eissn=1099-1638&rft.volume=41&rft.issue=6&rft.spage=2715&rft.epage=2728&rft_id=info:doi/10.1002%2Fqre.3806&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0748-8017&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0748-8017&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0748-8017&client=summon