Continuous learning of the value function utilizing deep reinforcement learning to be used as the objective in model predictive control

Reinforcement learning (RL) and model predictive control (MPC) possess an inherent synergy in the manner in which they function. The work presented here investigates integrating RL with existing MPC in order to provide a constrained policy for the RL while also creating an adaptable objective for th...

Full description

Saved in:
Bibliographic Details
Published inComputers & chemical engineering Vol. 201; p. 109262
Main Authors Beahr, Daniel, Hedrick, Elijah, Bhattacharyya, Debangsu
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.10.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Reinforcement learning (RL) and model predictive control (MPC) possess an inherent synergy in the manner in which they function. The work presented here investigates integrating RL with existing MPC in order to provide a constrained policy for the RL while also creating an adaptable objective for the MPC. Selection of MPC for combination with RL is not arbitrary. Two specific aspects of MPC are advantageous for such a combination: the use of a value function and the use of a model. The use of a model in MPC is useful since, by solving for the optimal trajectory, a projected view of the expected reward is gained. While this information can be inaccurate based on the current value function, it can allow for accelerated learning. By combining this with a correction for state transitions, an MPC formulation is derived that obeys the constraints set forth, but can adapt to changing dynamics and correct for plant model mismatch without a required discrete update, an advantage over standard MPC formulations. We propose two algorithms for the proposed value-function model predictive controller (VFMPC): one denoted as VFMPC(0) where the one step return is utilized to learn the cost function, and the other denoted as VFMPC(n), where the optimal trajectory is used to learn the n-step return subject to the dynamics of the process model. An artificial network (ANN) model is introduced into VFMPC(n) to improve the controller performance under slowly changing dynamics and plant-model mismatch, called VFMPC(NP). The developed algorithms are applied to two applications.
ISSN:0098-1354
DOI:10.1016/j.compchemeng.2025.109262