Learning Constrained Parametric Differentiable Predictive Control Policies With Guarantees

We present differentiable predictive control (DPC), a method for offline learning of constrained neural control policies for nonlinear dynamical systems with performance guarantees. We show that the sensitivities of the parametric optimal control problem can be used to obtain direct policy gradients...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on systems, man, and cybernetics. Systems Vol. 54; no. 6; pp. 3596 - 3607
Main Authors	Drgona, Jan, Tuor, Aaron, Vrabie, Draguna
Format	Journal Article
Language	English
Published	New York IEEE 01.06.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Closed loop systems Closed loops Computational modeling Constraints Control systems design Control tasks Controllers Differentiable programming Dynamical systems learning for control Machine learning Nonlinear control Nonlinear systems Optimal control Optimization Parameter sensitivity parametric optimal control Policies policy optimization (PO) Predictive control Programming Sensitivity Supervisory control
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present differentiable predictive control (DPC), a method for offline learning of constrained neural control policies for nonlinear dynamical systems with performance guarantees. We show that the sensitivities of the parametric optimal control problem can be used to obtain direct policy gradients. Specifically, we employ automatic differentiation (AD) to efficiently compute the sensitivities of the model predictive control (MPC) objective function and constraints penalties. To guarantee safety upon deployment, we derive probabilistic guarantees on closed-loop stability and constraint satisfaction based on indicator functions and Hoeffding's inequality. We empirically demonstrate that the proposed method can learn neural control policies for various parametric optimal control tasks. In particular, we show that the proposed DPC method can stabilize systems with unstable dynamics, track time-varying references, and satisfy nonlinear state and input constraints. Our DPC method has practical time savings compared to alternative approaches for fast and memory-efficient controller design. Specifically, DPC does not depend on a supervisory controller as opposed to approximate MPC based on imitation learning. We demonstrate that, without losing performance, DPC is scalable with greatly reduced demands on memory and computation compared to implicit and explicit MPC while being more sample efficient than model-free reinforcement learning (RL) algorithms.
ISSN:	2168-2216 2168-2232
DOI:	10.1109/TSMC.2024.3368026