Safe reinforcement learning: A control barrier function optimization approach

Summary This article presents a learning‐based barrier certified method to learn safe optimal controllers that guarantee operation of safety‐critical systems within their safe regions while providing an optimal performance. The cost function that encodes the designer's objectives is augmented w...

Full description

Saved in:

Bibliographic Details
Published in	International journal of robust and nonlinear control Vol. 31; no. 6; pp. 1923 - 1940
Main Authors	Marvi, Zahra, Kiumarsi, Bahare
Format	Journal Article
Language	English
Published	Bognor Regis Wiley Subscription Services, Inc 01.04.2021
Subjects	actor/critic Algorithms control barrier function Control stability Control systems design Controllers Cost function Damping Lane keeping Machine learning Optimization reinforcement learning Safety System dynamics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Summary This article presents a learning‐based barrier certified method to learn safe optimal controllers that guarantee operation of safety‐critical systems within their safe regions while providing an optimal performance. The cost function that encodes the designer's objectives is augmented with a control barrier function (CBF) to ensure safety and optimality. A damping coefficient is incorporated into the CBF which specifies the trade‐off between safety and optimality. The proposed formulation provides a look‐ahead and proactive safety planning and results in a smooth transition of states within the feasible set. That is, instead of applying an optimal controller and intervening with it only if the safety constraints are violated, the safety is planned and optimized along with the performance to minimize the intervention with the optimal controller. It is shown that addition of the CBF into the cost function does not affect the stability and optimality of the designed controller within the safe region. This formulation enables us to find the optimal safe solution iteratively. An off‐policy reinforcement learning (RL) algorithm is then employed to find a safe optimal policy without requiring the complete knowledge about the system dynamics, while satisfies the safety constraints. The efficacy of the proposed safe RL control design approach is demonstrated on the lane keeping as an automotive control problem.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1049-8923 1099-1239
DOI:	10.1002/rnc.5132