Trade-Off Between Robustness and Rewards Adversarial Training for Deep Reinforcement Learning Under Large Perturbations

Deep Reinforcement Learning (DRL) has become a popular approach for training robots due to its generalization promise, complex task capacity and minimal human intervention. Nevertheless, DRL-trained controllers are vulnerable to even the smallest of perturbations on its inputs which can lead to cata...

Full description

Saved in:

Bibliographic Details
Published in	IEEE robotics and automation letters Vol. 8; no. 12; pp. 8018 - 8025
Main Authors	Huang, Jeffrey, Choi, Ho Jin, Figueroa, Nadia
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.12.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adversarial machine learning Cart-pole problem Deep learning Manipulators Perturbation Perturbation methods Reinforcement learning Robot sensing systems robotic manipulation Robust control Robustness Tradeoffs
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep Reinforcement Learning (DRL) has become a popular approach for training robots due to its generalization promise, complex task capacity and minimal human intervention. Nevertheless, DRL-trained controllers are vulnerable to even the smallest of perturbations on its inputs which can lead to catastrophic failures in real-world human-centric environments with large and unexpected perturbations. In this work, we study the vulnerability of state-of-the-art DRL subject to large perturbations and propose a novel adversarial training framework for robust control. Our approach generates aggressive attacks on the state space and the expected state-action values to emulate real-world perturbations such as sensor noise, perception failures, physical perturbations, observations mismatch, etc. To achieve this, we reformulate the adversarial risk to yield a trade-off between rewards and robustness (TBRR). We show that TBRR-aided DRL training is robust to aggressive attacks and outperforms baselines on standard DRL benchmarks (Cartpole, Pendulum), Meta-World tasks (door manipulation) and a vision-based grasping task with a 7DoF manipulator. Finally, we show that the vision-based grasping task trained in simulation via TBRR transfers sim2real with 70% success rate subject to sensor impairment and physical perturbations without any retraining.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2023.3324590