Learning of Parameters in Behavior Trees for Movement Skills

Reinforcement Learning (RL) is a powerful mathematical framework that allows robots to learn complex skills by trial-and-error. Despite numerous successes in many applications, RL algorithms still require thousands of trials to converge to high-performing policies, can produce dangerous behaviors wh...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Mayr, Matthias, Chatzilygeroudis, Konstantinos, Faseeh Ahmad, Nardi, Luigi, Krueger, Volker
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 02.08.2022
Subjects	Algorithms Computer Science - Learning Computer Science - Robotics Computer simulation Learning Neural networks Obstacle avoidance Parameters Policies Robots Skills Workstations
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Reinforcement Learning (RL) is a powerful mathematical framework that allows robots to learn complex skills by trial-and-error. Despite numerous successes in many applications, RL algorithms still require thousands of trials to converge to high-performing policies, can produce dangerous behaviors while learning, and the optimized policies (usually modeled as neural networks) give almost zero explanation when they fail to perform the task. For these reasons, the adoption of RL in industrial settings is not common. Behavior Trees (BTs), on the other hand, can provide a policy representation that a) supports modular and composable skills, b) allows for easy interpretation of the robot actions, and c) provides an advantageous low-dimensional parameter space. In this paper, we present a novel algorithm that can learn the parameters of a BT policy in simulation and then generalize to the physical robot without any additional training. We leverage a physical simulator with a digital twin of our workstation, and optimize the relevant parameters with a black-box optimizer. We showcase the efficacy of our method with a 7-DOF KUKA-iiwa manipulator in a task that includes obstacle avoidance and a contact-rich insertion (peg-in-hole), in which our method outperforms the baselines.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2109.13050