Deep reinforcement learning-based rehabilitation robot trajectory planning with optimized reward functions

Deep reinforcement learning (DRL) provides a new solution for rehabilitation robot trajectory planning in the unstructured working environment, which can bring great convenience to patients. Previous researches mainly focused on optimization strategies but ignored the construction of reward function...

Full description

Saved in:

Bibliographic Details
Published in	Advances in mechanical engineering Vol. 13; no. 12; p. 168781402110670
Main Authors	Wang, Xusheng, Xie, Jiexin, Guo, Shijie, Li, Yue, Sun, Pengfei, Gan, Zhongxue
Format	Journal Article
Language	English
Published	London, England SAGE Publications 01.12.2021 Sage Publications Ltd SAGE Publishing
Subjects	Convergence Deep learning Efficiency Optimization Rehabilitation Rehabilitation robots Robots Trajectory planning Working conditions Rehabilitation robot deep reinforcement learning reward function trajectory planning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep reinforcement learning (DRL) provides a new solution for rehabilitation robot trajectory planning in the unstructured working environment, which can bring great convenience to patients. Previous researches mainly focused on optimization strategies but ignored the construction of reward functions, which leads to low efficiency. Different from traditional sparse reward function, this paper proposes two dense reward functions. First, azimuth reward function mainly provides a global guidance and reasonable constraints in the exploration. To further improve the efficiency, a process-oriented aspiration reward function is proposed, it is capable of accelerating the exploration process and avoid locally optimal solution. Experiments show that the proposed reward functions are able to accelerate the convergence rate by 38.4% on average with the mainstream DRL methods. The mean of convergence also increases by 9.5%, and the percentage of standard deviation decreases by 21.2%–23.3%. Results show that the proposed reward functions can significantly improve learning efficiency of DRL methods, and then provide practical possibility for automatic trajectory planning of rehabilitation robot.
ISSN:	1687-8132 1687-8140 1687-8140
DOI:	10.1177/16878140211067011