Adaptive cooperative exploration for reinforcement learning from imperfect demonstrations

•Proposing adaptive cooperative exploration method to guide the policy learning with imperfect demonstrations.•Encouraging agents to explore efficiently based on cooperative learning module.•Exploring the demonstrations effectively based on adaptive self-supervised exploration method.•Performance of...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition letters Vol. 165; pp. 176 - 182
Main Authors Huang, Fuxian, Ji, Naye, Ni, Huajian, Li, Shijian, Li, Xi
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Proposing adaptive cooperative exploration method to guide the policy learning with imperfect demonstrations.•Encouraging agents to explore efficiently based on cooperative learning module.•Exploring the demonstrations effectively based on adaptive self-supervised exploration method.•Performance of our approach is demonstrated on multiple control tasks. In reinforcement learning, exploration is an important way to learn new skills, but it is usually inefficient when faced with huge state-action space or sparse extrinsic rewards. Generally, expert demonstrations can assist the policy learning by leading the agent to imitate or explore these data. However, the demonstrations are often imperfect due to the data collection noise or immature expert. To this end, we propose a novel adaptive cooperative exploration method that can effectively alleviate the issues of imperfect demonstrations and improve the policy learning with them. Specifically, we propose a cooperative learning module to encourage two agents to explore diversely with it and then fuse the learned policies. Meanwhile, the adaptive self-supervised exploration method is presented to dynamically explore the demonstrations considering the environmental feedback. Therefore, the proposed method can achieve effective utilization of the imperfect demonstrations for policy learning. Experimental results demonstrate the effectiveness of the proposed method on MuJoCo benchmark.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2022.12.003