DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks
The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
25.04.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The success of many RL techniques heavily relies on human-engineered dense
rewards, which typically demand substantial domain expertise and extensive
trial and error. In our work, we propose DrS (Dense reward learning from
Stages), a novel approach for learning reusable dense rewards for multi-stage
tasks in a data-driven manner. By leveraging the stage structures of the task,
DrS learns a high-quality dense reward from sparse rewards and demonstrations
if given. The learned rewards can be \textit{reused} in unseen tasks, thus
reducing the human effort for reward engineering. Extensive experiments on
three physical robot manipulation task families with 1000+ task variants
demonstrate that our learned rewards can be reused in unseen tasks, resulting
in improved performance and sample efficiency of RL algorithms. The learned
rewards even achieve comparable performance to human-engineered rewards on some
tasks. See our project page (https://sites.google.com/view/iclr24drs) for more
details. |
---|---|
DOI: | 10.48550/arxiv.2404.16779 |