Batch process control based on reinforcement learning with segmented prioritized experience replay

Abstract Batch process is difficult to control accurately due to their complex nonlinear dynamics and unstable operating conditions. The traditional methods such as model predictive control, will seriously affect control performance when process model is inaccurate. In contrast, reinforcement learni...

Full description

Saved in:

Bibliographic Details
Published in	Measurement science & technology Vol. 35; no. 5; p. 56202
Main Authors	Xu, Chen, Ma, Junwei, Tao, Hongfeng
Format	Journal Article
Language	English
Published	01.05.2024
Online Access	Get full text

Cover

Loading…

Abstract	Abstract Batch process is difficult to control accurately due to their complex nonlinear dynamics and unstable operating conditions. The traditional methods such as model predictive control, will seriously affect control performance when process model is inaccurate. In contrast, reinforcement learning (RL) provides an viable alternative by interacting directly with the environment to learn optimal strategy. This paper proposes a batch process controller based on the segmented prioritized experience replay (SPER) soft actor-critic (SAC). SAC combines off-policy updates and maximum entropy RL with an actor-critic formulation, which can obtain a more robust control strategy than other RL methods. To improve the efficiency of the experience replay mechanism in tasks with long episodes and multiple phases, a new method of sampling experience called SPER is designed in SAC. In addition, a novel reward function is set for the SPER-SAC based controller to deal with the sparse reward. Finally, the effectiveness of the SPER-SAC based controller for batch process examples is demonstrated by comparing with the conventional RL-based control methods.
AbstractList	Abstract Batch process is difficult to control accurately due to their complex nonlinear dynamics and unstable operating conditions. The traditional methods such as model predictive control, will seriously affect control performance when process model is inaccurate. In contrast, reinforcement learning (RL) provides an viable alternative by interacting directly with the environment to learn optimal strategy. This paper proposes a batch process controller based on the segmented prioritized experience replay (SPER) soft actor-critic (SAC). SAC combines off-policy updates and maximum entropy RL with an actor-critic formulation, which can obtain a more robust control strategy than other RL methods. To improve the efficiency of the experience replay mechanism in tasks with long episodes and multiple phases, a new method of sampling experience called SPER is designed in SAC. In addition, a novel reward function is set for the SPER-SAC based controller to deal with the sparse reward. Finally, the effectiveness of the SPER-SAC based controller for batch process examples is demonstrated by comparing with the conventional RL-based control methods.
Author	Tao, Hongfeng Xu, Chen Ma, Junwei
Author_xml	– sequence: 1 givenname: Chen orcidid: 0000-0002-5399-5297 surname: Xu fullname: Xu, Chen – sequence: 2 givenname: Junwei surname: Ma fullname: Ma, Junwei – sequence: 3 givenname: Hongfeng orcidid: 0000-0001-5279-2458 surname: Tao fullname: Tao, Hongfeng
BookMark	eNo9kE1LAzEQhoNUsK3ePeYPrJ1J9iN71OIXFLzoeclmJ21kmyzJgtZf7y4VTzM8vO8wPCu28METY7cIdwhKbVCWmJUF4EZ3Ao29YMt_tGBLqIsqAyHlFVul9AkAFdT1krUPejQHPsRgKCVugh9j6HmrE3U8eB7JeRuioSP5kfeko3d-z7_ceOCJ9jOdgkN0IbrR_Uw7fQ8UHXlDU3no9emaXVrdJ7r5m2v28fT4vn3Jdm_Pr9v7XWawLsfMSiNUrmqprKqFLtDKqoTc6lpqA0LnylTYgVVIRhrCrhVdV5BFhTkJW8g1g_NdE0NKkWwzvXXU8dQgNLOjZhbSzEKasyP5C9Q9XzI
Cites_doi	10.1088/0957-0233/20/9/095106 10.1002/aic.14063 10.1016/j.arcontrol.2021.10.006 10.1002/aic.17658 10.1016/j.ifacol.2020.06.111 10.1016/j.compchemeng.2019.106649 10.1016/j.conengprac.2006.11.013 10.1016/j.conb.2020.08.005 10.3390/en14040997 10.1109/TII.2019.2894282 10.1016/j.neucom.2016.01.027 10.1016/j.jprocont.2013.05.007 10.1016/j.jprocont.2016.09.003 10.11992/tis.202003031 10.1016/0362-546X(89)90096-5 10.1016/j.compchemeng.2019.05.029 10.1016/j.jprocont.2010.06.007 10.1016/j.enconman.2021.114381 10.1016/j.compchemeng.2021.107489 10.1109/TIE.2016.2542134 10.1007/s10514-015-9455-y 10.1088/1361-6501/ace644 10.1016/j.neucom.2020.05.097 10.1016/j.compchemeng.2020.106886 10.1016/j.jprocont.2018.11.004 10.3390/s21082589 10.1038/nature14236 10.1007/978-3-030-60990-0_12 10.1016/j.compchemeng.2020.107133 10.1088/1361-6501/aceb82 10.1016/j.ces.2020.116171 10.1021/acs.iecr.0c02979 10.1016/j.chemolab.2019.103897 10.1021/acs.iecr.0c05678 10.1016/j.cjche.2018.06.006 10.1088/1361-6501/ab48c7 10.1177/0278364920987859 10.1016/j.compchemeng.2021.107255
ContentType	Journal Article
DBID	AAYXX CITATION
DOI	10.1088/1361-6501/ad21cf
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Sciences (General) Physics
EISSN	1361-6501
ExternalDocumentID	10_1088_1361_6501_ad21cf
GroupedDBID	-DZ -~X .DC 1JI 4.4 5B3 5GY 5PX 5VS 5ZH 7.M 7.Q AAGCD AAGID AAHTB AAJIO AAJKP AATNI AAYXX ABCXL ABHWH ABJNI ABPEJ ABQJV ABVAM ACAFW ACBEA ACGFO ACGFS ACHIP AEFHF AENEX AFYNE AKPSB ALMA_UNASSIGNED_HOLDINGS AOAED ASPBG ATQHT AVWKF AZFZN CBCFC CEBXE CITATION CJUJL CRLBU CS3 DU5 EBS EDWGO EMSAF EPQRW EQZZN F5P HAK IHE IJHAN IOP IZVLO KOT LAP N5L N9A P2P PJBAE R4D RIN RNS RO9 ROL RPA SY9 TAE TN5 TWZ W28 WH7 XPP YQT ZMT ~02
ID	FETCH-LOGICAL-c196t-f3c2848938f892a51f37604fa93ac02a48c71d0f81ec3ce1db2dd5ef1814e2f53
ISSN	0957-0233
IngestDate	Fri Aug 23 03:25:32 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	5
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c196t-f3c2848938f892a51f37604fa93ac02a48c71d0f81ec3ce1db2dd5ef1814e2f53
ORCID	0000-0001-5279-2458 0000-0002-5399-5297
ParticipantIDs	crossref_primary_10_1088_1361_6501_ad21cf
PublicationCentury	2000
PublicationDate	2024-05-01
PublicationDateYYYYMMDD	2024-05-01
PublicationDate_xml	– month: 05 year: 2024 text: 2024-05-01 day: 01
PublicationDecade	2020
PublicationTitle	Measurement science & technology
PublicationYear	2024
References	Liu (mstad21cfbib41) 2020; 196 Wen (mstad21cfbib22) 2019; 15 Brittain (mstad21cfbib32) 2015 Mahmood (mstad21cfbib38) 2014 Nian (mstad21cfbib14) 2020; 139 Oh (mstad21cfbib12) 2022; 68 Nikita (mstad21cfbib18) 2021; 230 Petsagkourakis (mstad21cfbib20) 2020; 133 Yang (mstad21cfbib37) 2020; 15 Yang (mstad21cfbib42) 2016; 190 Aumi (mstad21cfbib6) 2013; 59 Anderson (mstad21cfbib40) 2015; 39 Hong (mstad21cfbib5) 2021; 147 Lee (mstad21cfbib16) 2010; 20 Zhao (mstad21cfbib2) 2023; 34 Ionescu (mstad21cfbib29) 2021; 21 Zhang (mstad21cfbib39) 2020; 411 Brásio (mstad21cfbib9) 2016; 47 Yoo (mstad21cfbib10) 2021; 144 Haarnoja (mstad21cfbib27) 2018 Jia (mstad21cfbib7) 2018; 26 Coraci (mstad21cfbib28) 2021; 14 Lee (mstad21cfbib4) 2007; 15 Schaul (mstad21cfbib31) 2015 Bangi (mstad21cfbib25) 2021; 154 Mnih (mstad21cfbib17) 2015; 518 Chen (mstad21cfbib26) 2020; 65 Kong (mstad21cfbib8) 2019; 31 Kingma (mstad21cfbib44) 2014 Wang (mstad21cfbib33) 2019 Ibarz (mstad21cfbib23) 2021; 40 Zhang (mstad21cfbib36) 2017; 64 Singh (mstad21cfbib19) 2020; 53 Levine (mstad21cfbib46) 2020 Bao (mstad21cfbib11) 2021; 60 Shin (mstad21cfbib13) 2019; 127 Wang (mstad21cfbib15) 2023; 34 Haarnoja (mstad21cfbib35) 2018 Yoo (mstad21cfbib1) 2021; 52 Khatibisepehr (mstad21cfbib43) 2013; 23 Huang (mstad21cfbib3) 2009; 20 Zhang (mstad21cfbib21) 2021; vol 325 Ma (mstad21cfbib24) 2019; 75 Joshi (mstad21cfbib34) 2020; 59 Barron (mstad21cfbib45) 1989; 13 Zhang (mstad21cfbib30) 2021; 243
References_xml	– volume: 20 year: 2009 ident: mstad21cfbib3 article-title: A carrier phase batch processor for differential global positioning system: simulation and real-data results publication-title: Meas. Sci. Technol. doi: 10.1088/0957-0233/20/9/095106 contributor: fullname: Huang – volume: 59 start-page: 2852 year: 2013 ident: mstad21cfbib6 article-title: Data-driven model predictive quality control of batch processes publication-title: AIChE J. doi: 10.1002/aic.14063 contributor: fullname: Aumi – volume: 52 start-page: 108 year: 2021 ident: mstad21cfbib1 article-title: Reinforcement learning for batch process control: review and perspectives publication-title: Ann. Rev. Control doi: 10.1016/j.arcontrol.2021.10.006 contributor: fullname: Yoo – volume: 68 year: 2022 ident: mstad21cfbib12 article-title: Integration of reinforcement learning and model predictive control to optimize semi-batch bioreactor publication-title: AIChE J. doi: 10.1002/aic.17658 contributor: fullname: Oh – volume: 53 start-page: 667 year: 2020 ident: mstad21cfbib19 article-title: Reinforcement learning based control of batch polymerisation processes publication-title: IFAC-PapersOnLine doi: 10.1016/j.ifacol.2020.06.111 contributor: fullname: Singh – volume: 133 year: 2020 ident: mstad21cfbib20 article-title: Reinforcement learning for batch bioprocess optimization publication-title: Comput. Chem. Eng. doi: 10.1016/j.compchemeng.2019.106649 contributor: fullname: Petsagkourakis – volume: 15 start-page: 1306 year: 2007 ident: mstad21cfbib4 article-title: Iterative learning control applied to batch processes: an overview publication-title: Control Eng. Pract. doi: 10.1016/j.conengprac.2006.11.013 contributor: fullname: Lee – volume: 65 start-page: 1 year: 2020 ident: mstad21cfbib26 article-title: Actor-critic reinforcement learning in the songbird publication-title: Curr. Opin. Neurobiol. doi: 10.1016/j.conb.2020.08.005 contributor: fullname: Chen – start-page: pp 1861 year: 2018 ident: mstad21cfbib27 article-title: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor contributor: fullname: Haarnoja – volume: 14 start-page: 997 year: 2021 ident: mstad21cfbib28 article-title: Online implementation of a soft actor-critic agent to enhance indoor temperature control and energy efficiency in buildings publication-title: Energies doi: 10.3390/en14040997 contributor: fullname: Coraci – volume: 15 start-page: 4969 year: 2019 ident: mstad21cfbib22 article-title: Optimized adaptive nonlinear tracking control using actor-critic reinforcement learning strategy publication-title: IEEE Trans. Ind. Inf. doi: 10.1109/TII.2019.2894282 contributor: fullname: Wen – volume: 190 start-page: 117 year: 2016 ident: mstad21cfbib42 article-title: Fed-batch fermentation penicillin process fault diagnosis and detection based on support vector machine publication-title: Neurocomputing doi: 10.1016/j.neucom.2016.01.027 contributor: fullname: Yang – volume: 23 start-page: 1575 year: 2013 ident: mstad21cfbib43 article-title: Design of inferential sensors in the process industry: a review of Bayesian methods publication-title: J. Process Control doi: 10.1016/j.jprocont.2013.05.007 contributor: fullname: Khatibisepehr – year: 2015 ident: mstad21cfbib32 article-title: Prioritized sequence experience replay contributor: fullname: Brittain – volume: 47 start-page: 11 year: 2016 ident: mstad21cfbib9 article-title: First principle modeling and predictive control of a continuous biodiesel plant publication-title: J. Process Control doi: 10.1016/j.jprocont.2016.09.003 contributor: fullname: Brásio – volume: 15 start-page: 888 year: 2020 ident: mstad21cfbib37 article-title: Survey of sparse reward algorithms in reinforcement learning—theory and experiment publication-title: CAAI Trans. Intell. Syst. doi: 10.11992/tis.202003031 contributor: fullname: Yang – volume: 13 start-page: 1067 year: 1989 ident: mstad21cfbib45 article-title: The Bellman equation for minimizing the maximum cost publication-title: Nonlinear Anal. Theory Methods Appl. doi: 10.1016/0362-546X(89)90096-5 contributor: fullname: Barron – start-page: p 27 year: 2014 ident: mstad21cfbib38 article-title: Weighted importance sampling for off-policy learning with linear function approximation contributor: fullname: Mahmood – year: 2020 ident: mstad21cfbib46 article-title: Offline reinforcement learning: tutorial, review, and perspectives on open problems contributor: fullname: Levine – volume: 127 start-page: 282 year: 2019 ident: mstad21cfbib13 article-title: Reinforcement learning-overview of recent progress and implications for process control publication-title: Comput. Chem. Eng. doi: 10.1016/j.compchemeng.2019.05.029 contributor: fullname: Shin – volume: 20 start-page: 1038 year: 2010 ident: mstad21cfbib16 article-title: Approximate dynamic programming approach for process control publication-title: J. Process Control doi: 10.1016/j.jprocont.2010.06.007 contributor: fullname: Lee – year: 2018 ident: mstad21cfbib35 article-title: Soft actor-critic algorithms and applications contributor: fullname: Haarnoja – volume: 243 year: 2021 ident: mstad21cfbib30 article-title: Soft actor-critic based multi-objective optimized energy conversion and management strategy for integrated energy systems with renewable energy publication-title: Energy Convers. Manage. doi: 10.1016/j.enconman.2021.114381 contributor: fullname: Zhang – volume: 154 year: 2021 ident: mstad21cfbib25 article-title: Deep reinforcement learning control of hydraulic fracturing publication-title: Comput. Chem. Eng. doi: 10.1016/j.compchemeng.2021.107489 contributor: fullname: Bangi – volume: 64 start-page: 4091 year: 2017 ident: mstad21cfbib36 article-title: Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method publication-title: IEEE Trans. Ind. Electron. doi: 10.1109/TIE.2016.2542134 contributor: fullname: Zhang – volume: 39 start-page: 221 year: 2015 ident: mstad21cfbib40 article-title: Batch nonlinear continuous-time trajectory estimation as exactly sparse gaussian process regression publication-title: Auton. Robots doi: 10.1007/s10514-015-9455-y contributor: fullname: Anderson – year: 2014 ident: mstad21cfbib44 article-title: Adam: a method for stochastic optimization contributor: fullname: Kingma – volume: 34 year: 2023 ident: mstad21cfbib15 article-title: Match-reinforcement learning with time frequency selection for bearing fault diagnosis publication-title: Meas. Sci. Technol. doi: 10.1088/1361-6501/ace644 contributor: fullname: Wang – volume: 411 start-page: 206 year: 2020 ident: mstad21cfbib39 article-title: A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment publication-title: Neurocomputing doi: 10.1016/j.neucom.2020.05.097 contributor: fullname: Zhang – volume: 139 year: 2020 ident: mstad21cfbib14 article-title: A review on reinforcement learning: introduction and applications in industrial process control publication-title: Comput. Chem. Eng. doi: 10.1016/j.compchemeng.2020.106886 contributor: fullname: Nian – volume: 75 start-page: 40 year: 2019 ident: mstad21cfbib24 article-title: Continuous control of a polymerization system with deep reinforcement learning publication-title: J. Process Control doi: 10.1016/j.jprocont.2018.11.004 contributor: fullname: Ma – volume: 21 start-page: 2589 year: 2021 ident: mstad21cfbib29 article-title: Adaptive simplex architecture for safe, real-time robot path planning publication-title: Sensors doi: 10.3390/s21082589 contributor: fullname: Ionescu – year: 2019 ident: mstad21cfbib33 article-title: Boosting soft actor-critic: emphasizing recent experience without forgetting the past contributor: fullname: Wang – volume: 518 start-page: 529 year: 2015 ident: mstad21cfbib17 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 contributor: fullname: Mnih – year: 2015 ident: mstad21cfbib31 article-title: Prioritized experience replay contributor: fullname: Schaul – volume: vol 325) start-page: 321 year: 2021 ident: mstad21cfbib21 article-title: Multi-agent reinforcement learning: a selective overview of theories and algorithms doi: 10.1007/978-3-030-60990-0_12 contributor: fullname: Zhang – volume: 144 year: 2021 ident: mstad21cfbib10 article-title: Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation publication-title: Comput. Chem. Eng. doi: 10.1016/j.compchemeng.2020.107133 contributor: fullname: Yoo – volume: 34 year: 2023 ident: mstad21cfbib2 article-title: Deep learning with CBAM-based CNN for batch process quality prediction publication-title: Meas. Sci. Technol. doi: 10.1088/1361-6501/aceb82 contributor: fullname: Zhao – volume: 230 year: 2021 ident: mstad21cfbib18 article-title: Reinforcement learning based optimization of process chromatography for continuous processing of biopharmaceuticals publication-title: Chem. Eng. Sci. doi: 10.1016/j.ces.2020.116171 contributor: fullname: Nikita – volume: 59 start-page: 19334 year: 2020 ident: mstad21cfbib34 article-title: A novel dynamic just-in-time learning framework for modeling of batch processes publication-title: Ind. Eng. Chem. Res. doi: 10.1021/acs.iecr.0c02979 contributor: fullname: Joshi – volume: 196 year: 2020 ident: mstad21cfbib41 article-title: Wavelet functional principal component analysis for batch process monitoring publication-title: Chem. Intell. Lab. Syst. doi: 10.1016/j.chemolab.2019.103897 contributor: fullname: Liu – volume: 60 start-page: 5504 year: 2021 ident: mstad21cfbib11 article-title: A deep reinforcement learning approach to improve the learning performance in process control publication-title: Ind. Eng. Chem. Res. doi: 10.1021/acs.iecr.0c05678 contributor: fullname: Bao – volume: 26 start-page: 1713 year: 2018 ident: mstad21cfbib7 article-title: Just-in-time learning based integrated MPC-ILC control for batch processes publication-title: Chin. J. Chem. Eng. doi: 10.1016/j.cjche.2018.06.006 contributor: fullname: Jia – volume: 31 year: 2019 ident: mstad21cfbib8 article-title: Industrial process deep feature representation by regularization strategy autoencoders for process monitoring publication-title: Meas. Sci. Technol. doi: 10.1088/1361-6501/ab48c7 contributor: fullname: Kong – volume: 40 start-page: 698 year: 2021 ident: mstad21cfbib23 article-title: How to train your robot with deep reinforcement learning: lessons we have learned publication-title: Int. J. Robot. Res. doi: 10.1177/0278364920987859 contributor: fullname: Ibarz – volume: 147 year: 2021 ident: mstad21cfbib5 article-title: Mechanistic modeling and parameter-adaptive nonlinear model predictive control of a microbioreacto publication-title: Comput. Chem. Eng. doi: 10.1016/j.compchemeng.2021.107255 contributor: fullname: Hong
SSID	ssj0007099
Score	2.4707344
Snippet	Abstract Batch process is difficult to control accurately due to their complex nonlinear dynamics and unstable operating conditions. The traditional methods...
SourceID	crossref
SourceType	Aggregation Database
StartPage	56202
Title	Batch process control based on reinforcement learning with segmented prioritized experience replay
Volume	35
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JatxAEG0mDoFcQuzYZKcPPsSYttWbJR0TkzAxTJKDDXMTrV4cQ9AMHg0Gf72rFy1eAnEuQoihGKkeVaWqV08I7bLMKc1zRZyiiogis6TMuCUsZ1ywXDIV2O6zH0fTM3Eyl_PJZDliLa3b-kBfP7hX8j9ehWvgV78l-wjP9kbhApyDf-EIHobjP_n4C8TR3_vLyPXvWec-MRk_BLi0QRZVhw5g932I1Hpd2fMgx-lVAi4WXtnoGs5tL3zshwl_1K2Z72xoJ-5320AeOO297vx8HSf5w5rZLFJy182VvRiaBaFNO100586mBJr6D0wMbL--kZgTyPwxTNkYRvkRJVD70XGcjbIkCU9yFDShBAtb1_fDOYRA31norPm8ZRjVbkhe3cD-Tk7rmYZhxl4UlbdReRtVtPAEPWV5KT0J9PvPX33uzrMyqTPGe0qDbbBw2P-Lw2hhVMiMKpLTl-hFepXAnyMuNtHENlvoWaD06tUW2kxhe4U_JW3xvVeoDpDBCTI4QQYHyOBFg29BBneQwR4yuIcMHkEGD5DBETLb6Ozb19PjKUmf2SAawm9LHNdQo0DdWriiZEpS54lSwqmSK50xJQqdU5O5glrNtaWmZsZI66A2FJY5yXfQRrNo7GuEuZHUCnharoZKt5ClNtJwLyFEpXIie4P2umdWLaOaSvU3D719xG_foecDNN-jjfZybT9AsdjWH4N_bwD_Umr4
link.rule.ids	315,786,790,27955,27956
linkProvider	IOP Publishing
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Batch+process+control+based+on+reinforcement+learning+with+segmented+prioritized+experience+replay&rft.jtitle=Measurement+science+%26+technology&rft.au=Xu%2C+Chen&rft.au=Ma%2C+Junwei&rft.au=Tao%2C+Hongfeng&rft.date=2024-05-01&rft.issn=0957-0233&rft.eissn=1361-6501&rft.volume=35&rft.issue=5&rft.spage=56202&rft_id=info:doi/10.1088%2F1361-6501%2Fad21cf&rft.externalDBID=n%2Fa&rft.externalDocID=10_1088_1361_6501_ad21cf
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-0233&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-0233&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-0233&client=summon