Adaptive cooperative exploration for reinforcement learning from imperfect demonstrations
•Proposing adaptive cooperative exploration method to guide the policy learning with imperfect demonstrations.•Encouraging agents to explore efficiently based on cooperative learning module.•Exploring the demonstrations effectively based on adaptive self-supervised exploration method.•Performance of...
Saved in:
Published in | Pattern recognition letters Vol. 165; pp. 176 - 182 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.01.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Proposing adaptive cooperative exploration method to guide the policy learning with imperfect demonstrations.•Encouraging agents to explore efficiently based on cooperative learning module.•Exploring the demonstrations effectively based on adaptive self-supervised exploration method.•Performance of our approach is demonstrated on multiple control tasks.
In reinforcement learning, exploration is an important way to learn new skills, but it is usually inefficient when faced with huge state-action space or sparse extrinsic rewards. Generally, expert demonstrations can assist the policy learning by leading the agent to imitate or explore these data. However, the demonstrations are often imperfect due to the data collection noise or immature expert.
To this end, we propose a novel adaptive cooperative exploration method that can effectively alleviate the issues of imperfect demonstrations and improve the policy learning with them. Specifically, we propose a cooperative learning module to encourage two agents to explore diversely with it and then fuse the learned policies. Meanwhile, the adaptive self-supervised exploration method is presented to dynamically explore the demonstrations considering the environmental feedback. Therefore, the proposed method can achieve effective utilization of the imperfect demonstrations for policy learning. Experimental results demonstrate the effectiveness of the proposed method on MuJoCo benchmark. |
---|---|
ISSN: | 0167-8655 1872-7344 |
DOI: | 10.1016/j.patrec.2022.12.003 |