Learning Navigation Behaviors End-to-End With AutoRL

We learn end-to-end point-to-point and path-following navigation behaviors that avoid moving obstacles. These policies receive noisy lidar observations and output robot linear and angular velocities. The policies are trained in small, static environments with AutoRL, an evolutionary automation layer...

Full description

Saved in:

Bibliographic Details
Published in	IEEE robotics and automation letters Vol. 4; no. 2; pp. 2007 - 2014
Main Authors	Chiang, Hao-Tien Lewis, Faust, Aleksandra, Fiser, Marek, Francis, Anthony
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.04.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Actuators Algorithms Angular velocity Autonomous agents Collision avoidance Computer simulation deep learning in robotics and automation Machine learning motion and path planning Moving obstacles Navigation Neural networks Obstacle avoidance Optimization Policies Reinforcement learning Robot sensing systems Robots Task analysis Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We learn end-to-end point-to-point and path-following navigation behaviors that avoid moving obstacles. These policies receive noisy lidar observations and output robot linear and angular velocities. The policies are trained in small, static environments with AutoRL, an evolutionary automation layer around reinforcement learning (RL) that searches for a deep RL reward and neural network architecture with large-scale hyper-parameter optimization. AutoRL first finds a reward that maximizes task completion and then finds a neural network architecture that maximizes the cumulative of the found reward. Empirical evaluations, both in simulation and on-robot, show that AutoRL policies do not suffer from the catastrophic forgetfulness that plagues many other deep reinforcement learning algorithms, generalize to new environments and moving obstacles, are robust to sensor, actuator, and localization noise, and can serve as robust building blocks for larger navigation tasks. Our path-following and point-to-point policies are, respectively, 23% and 26% more successful than comparison methods across new environments.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2019.2899918