Alleviating parameter-tuning burden in reinforcement learning for large-scale process control
•First reinforcement learning algorithm for alleviating parameter tuning in large-scale process control problems.•Ensuring monotonic improvement even with underperforming parameters.•Factorial and random feature approximation for efficient learning in large-scale spaces.•Comprehensive experiments on...
Saved in:
Published in | Computers & chemical engineering Vol. 158; p. 107658 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.02.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •First reinforcement learning algorithm for alleviating parameter tuning in large-scale process control problems.•Ensuring monotonic improvement even with underperforming parameters.•Factorial and random feature approximation for efficient learning in large-scale spaces.•Comprehensive experiments on Vinyl Acetate Monomer process showing parameter-robustness.
Modern process controllers necessitate high quality models and remedial system re-identification upon performance degradation. Reinforcement Learning (RL) can be a promising replacement for those laborious manual procedures. However, in realistic scenarios time is limited, algorithms that can robustly learn with reduced human-agent interactions or self-exploration e.g. parameter tuning are desired. In practice, a great portion of time in setting up an RL algorithm to properly work is spent on those trial-and-error interactions. To reduce the interaction time, we propose a principled framework to ensure monotonic policy improvement even with underperforming parameters, enhancing the robustness of RL process against parameter setting. We incorporate key ingredients such as random features and factorial policy into monotonic improvement mechanism for learning cautiously in large-scale process control problems. We demonstrate in challenging control problems on the simulated vinyl acetate monomer process that the proposed method robustly learns meaningful policy within a short, fixed learning horizon given various parameter configurations that simulate the interactions, comparing to the other method that can only show good performance specific to a narrow range of parameters. |
---|---|
ISSN: | 0098-1354 1873-4375 |
DOI: | 10.1016/j.compchemeng.2022.107658 |