Trade-Off Between Robustness and Rewards Adversarial Training for Deep Reinforcement Learning Under Large Perturbations
Deep Reinforcement Learning (DRL) has become a popular approach for training robots due to its generalization promise, complex task capacity and minimal human intervention. Nevertheless, DRL-trained controllers are vulnerable to even the smallest of perturbations on its inputs which can lead to cata...
Saved in:
Published in | IEEE robotics and automation letters Vol. 8; no. 12; pp. 8018 - 8025 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.12.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Deep Reinforcement Learning (DRL) has become a popular approach for training robots due to its generalization promise, complex task capacity and minimal human intervention. Nevertheless, DRL-trained controllers are vulnerable to even the smallest of perturbations on its inputs which can lead to catastrophic failures in real-world human-centric environments with large and unexpected perturbations. In this work, we study the vulnerability of state-of-the-art DRL subject to large perturbations and propose a novel adversarial training framework for robust control. Our approach generates aggressive attacks on the state space and the expected state-action values to emulate real-world perturbations such as sensor noise, perception failures, physical perturbations, observations mismatch, etc. To achieve this, we reformulate the adversarial risk to yield a trade-off between rewards and robustness (TBRR). We show that TBRR-aided DRL training is robust to aggressive attacks and outperforms baselines on standard DRL benchmarks (Cartpole, Pendulum), Meta-World tasks (door manipulation) and a vision-based grasping task with a 7DoF manipulator. Finally, we show that the vision-based grasping task trained in simulation via TBRR transfers sim2real with 70% success rate subject to sensor impairment and physical perturbations without any retraining. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2377-3766 2377-3766 |
DOI: | 10.1109/LRA.2023.3324590 |