Learning Setup Policies: Reliable Transition Between Locomotion Behaviours

Dynamic platforms that operate over many unique terrain conditions typically require many behaviours. To transition safely, there must be an overlap of states between adjacent controllers. We develop a novel method for training setup policies that bridge the trajectories between pre-trained Deep Rei...

Full description

Saved in:

Bibliographic Details
Published in	IEEE robotics and automation letters Vol. 7; no. 4; pp. 1 - 8
Main Authors	Tidd, Brendan, Hudson, Nicolas, Cosgun, Akansel, Leitner, Jurgen
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.10.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Ablation Deep learning Humanoid and bipedal locomotion Legged locomotion Locomotion Policies Reinforcement learning Robots Switches Task analysis Terrain Training Trajectory vision-based navigation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Dynamic platforms that operate over many unique terrain conditions typically require many behaviours. To transition safely, there must be an overlap of states between adjacent controllers. We develop a novel method for training setup policies that bridge the trajectories between pre-trained Deep Reinforcement Learning (DRL) policies. We demonstrate our method with a simulated biped traversing a difficult jump terrain, where a single policy fails to learn the task, and switching between pre-trained policies without setup policies also fails. We perform an ablation of key components of our system, and show that our method outperforms others that learn transition policies. We demonstrate our method with several difficult and diverse terrain types, and show that we can use setup policies as part of a modular control suite to successfully traverse a sequence of complex terrains. We show that using setup policies improves the success rate for traversing a single difficult jump terrain (from 51.3<inline-formula><tex-math notation="LaTeX">\%</tex-math></inline-formula> success rate with the best comparative method to 82.2<inline-formula><tex-math notation="LaTeX">\%</tex-math></inline-formula>), and traversing a random sequence of difficult obstacles (from 1.9<inline-formula><tex-math notation="LaTeX">\%</tex-math></inline-formula> without setup policies to 71.2<inline-formula><tex-math notation="LaTeX">\%</tex-math></inline-formula>).
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2022.3207567