A novel agent with formal goal-reaching guarantees: an experimental study with a mobile robot
Reinforcement Learning (RL) has been shown to be effective and convenient for a number of tasks in robotics. However, it requires the exploration of a sufficiently large number of state-action pairs, many of which may be unsafe or unimportant. For instance, online model-free learning can be hazardou...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Reinforcement Learning (RL) has been shown to be effective and convenient for
a number of tasks in robotics. However, it requires the exploration of a
sufficiently large number of state-action pairs, many of which may be unsafe or
unimportant. For instance, online model-free learning can be hazardous and
inefficient in the absence of guarantees that a certain set of desired states
will be reached during an episode. An increasingly common approach to address
safety involves the addition of a shielding system that constrains the RL
actions to a safe set of actions. In turn, a difficulty for such frameworks is
how to effectively couple RL with the shielding system to make sure the
exploration is not excessively restricted. This work presents a novel safe
model-free RL agent called Critic As Lyapunov Function (CALF) and showcases how
CALF can be used to improve upon control baselines in robotics in an efficient
and convenient fashion while ensuring guarantees of stable goal reaching. The
latter is a crucial part of safety, as seen generally. With CALF all
state-action pairs remain explorable and yet reaching of desired goal states is
formally guaranteed. Formal analysis is provided that shows the goal
stabilization-ensuring properties of CALF and a set of real-world and numerical
experiments with a non-holonomic wheeled mobile robot (WMR) TurtleBot3 Burger
confirmed the superiority of CALF over such a well-established RL agent as
proximal policy optimization (PPO), and a modified version of SARSA in a
few-episode setting in terms of attained total cost. |
---|---|
DOI: | 10.48550/arxiv.2409.14867 |