Alleviating parameter-tuning burden in reinforcement learning for large-scale process control

•First reinforcement learning algorithm for alleviating parameter tuning in large-scale process control problems.•Ensuring monotonic improvement even with underperforming parameters.•Factorial and random feature approximation for efficient learning in large-scale spaces.•Comprehensive experiments on...

Full description

Saved in:
Bibliographic Details
Published inComputers & chemical engineering Vol. 158; p. 107658
Main Authors Zhu, Lingwei, Takami, Go, Kawahara, Mizuo, Kanokogi, Hiroaki, Matsubara, Takamitsu
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.02.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•First reinforcement learning algorithm for alleviating parameter tuning in large-scale process control problems.•Ensuring monotonic improvement even with underperforming parameters.•Factorial and random feature approximation for efficient learning in large-scale spaces.•Comprehensive experiments on Vinyl Acetate Monomer process showing parameter-robustness. Modern process controllers necessitate high quality models and remedial system re-identification upon performance degradation. Reinforcement Learning (RL) can be a promising replacement for those laborious manual procedures. However, in realistic scenarios time is limited, algorithms that can robustly learn with reduced human-agent interactions or self-exploration e.g. parameter tuning are desired. In practice, a great portion of time in setting up an RL algorithm to properly work is spent on those trial-and-error interactions. To reduce the interaction time, we propose a principled framework to ensure monotonic policy improvement even with underperforming parameters, enhancing the robustness of RL process against parameter setting. We incorporate key ingredients such as random features and factorial policy into monotonic improvement mechanism for learning cautiously in large-scale process control problems. We demonstrate in challenging control problems on the simulated vinyl acetate monomer process that the proposed method robustly learns meaningful policy within a short, fixed learning horizon given various parameter configurations that simulate the interactions, comparing to the other method that can only show good performance specific to a narrow range of parameters.
ISSN:0098-1354
1873-4375
DOI:10.1016/j.compchemeng.2022.107658