Robust reinforcement learning with UUB guarantee for safe motion control of autonomous robots

This paper addresses the issue of safety in reinforcement learning (RL) with disturbances and its application in the safety-constrained motion control of autonomous robots. To tackle this problem, a robust Lyapunov value function (rLVF) is proposed. The rLVF is obtained by introducing a data-based L...

Full description

Saved in:

Bibliographic Details
Published in	Science China. Technological sciences Vol. 67; no. 1; pp. 172 - 182
Main Authors	Zhang, RuiXian, Han, YiNing, Su, Man, Lin, ZeFeng, Li, HaoWei, Zhang, LiXian
Format	Journal Article
Language	English
Published	Beijing Science China Press 2024 Springer Nature B.V
Subjects	Algorithms Control tasks Cost function Criteria Disturbances Engineering Motion control Regularization Robot control Robot dynamics Robustness Safety robustness motion control stability reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper addresses the issue of safety in reinforcement learning (RL) with disturbances and its application in the safety-constrained motion control of autonomous robots. To tackle this problem, a robust Lyapunov value function (rLVF) is proposed. The rLVF is obtained by introducing a data-based LVF under the worst-case disturbance of the observed state. Using the rLVF, a uniformly ultimate boundedness criterion is established. This criterion is desired to ensure that the cost function, which serves as a safety criterion, ultimately converges to a range via the policy to be designed. Moreover, to mitigate the drastic variation of the rLVF caused by differences in states, a smoothing regularization of the rLVF is introduced. To train policies with safety guarantees under the worst disturbances of the observed states, an off-policy robust RL algorithm is proposed. The proposed algorithm is applied to motion control tasks of an autonomous vehicle and a cartpole, which involve external disturbances and variations of the model parameters, respectively. The experimental results demonstrate the effectiveness of the theoretical findings and the advantages of the proposed algorithm in terms of robustness and safety.
ISSN:	1674-7321 1869-1900
DOI:	10.1007/s11431-023-2435-3