Human skill knowledge guided global trajectory policy reinforcement learning method

Traditional trajectory learning methods based on Imitation Learning (IL) only learn the existing trajectory knowledge from human demonstration. In this way, it can not adapt the trajectory knowledge to the task environment by interacting with the environment and fine-tuning the policy. To address th...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in neurorobotics Vol. 18; p. 1368243
Main Authors	Zang, Yajing, Wang, Pengfei, Zha, Fusheng, Guo, Wei, Li, Chuanfeng, Sun, Lining
Format	Journal Article
Language	English
Published	Switzerland Frontiers Media S.A 2024
Subjects	behavioral cloning imitation learning path planning probabilistic movement primitives reinforcement learning imitation learning behavioral cloning probabilistic movement primitives path planning reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Traditional trajectory learning methods based on Imitation Learning (IL) only learn the existing trajectory knowledge from human demonstration. In this way, it can not adapt the trajectory knowledge to the task environment by interacting with the environment and fine-tuning the policy. To address this problem, a global trajectory learning method which combinines IL with Reinforcement Learning (RL) to adapt the knowledge policy to the environment is proposed. In this paper, IL is proposed to acquire basic trajectory skills, and then learns the agent will explore and exploit more policy which is applicable to the current environment by RL. The basic trajectory skills include the knowledge policy and the time stage information in the whole task space to help learn the time series of the trajectory, and are used to guide the subsequent RL process. Notably, neural networks are not used to model the action policy and the Q value of RL during the RL process. Instead, they are sampled and updated in the whole task space and then transferred to the networks after the RL process through Behavior Cloning (BC) to get continuous and smooth global trajectory policy. The feasibility and the effectiveness of the method was validated in a custom Gym environment of a flower drawing task. And then, we executed the learned policy in the real-world robot drawing experiment.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1662-5218 1662-5218
DOI:	10.3389/fnbot.2024.1368243