Reinforcement learning in dual-arm trajectory planning for a free-floating space robot

A free-floating space robot exhibits strong dynamic coupling between the arm and the base, and the resulting position of the end of the arm depends not only on the joint angles but also on the state of the base. Dynamic modeling is complicated for multiple degree of freedom (DOF) manipulators, espec...

Full description

Saved in:

Bibliographic Details
Published in	Aerospace science and technology Vol. 98; p. 105657
Main Authors	Wu, Yun-Hua, Yu, Zhi-Cheng, Li, Chao-Yong, He, Meng-Jie, Hua, Bing, Chen, Zhi-Ming
Format	Journal Article
Language	English
Published	Elsevier Masson SAS 01.03.2020
Subjects	Dual-arm trajectory planning Fixed and moving targets Free-floating space robot On-orbit servicing Reinforcement learning On-orbit servicing Free-floating space robot Fixed and moving targets Dual-arm trajectory planning Reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A free-floating space robot exhibits strong dynamic coupling between the arm and the base, and the resulting position of the end of the arm depends not only on the joint angles but also on the state of the base. Dynamic modeling is complicated for multiple degree of freedom (DOF) manipulators, especially for a space robot with two arms. Therefore, the trajectories are typically planned offline and tracked online. However, this approach is not suitable if the target has relative motion with respect to the servicing space robot. To handle this issue, a model-free reinforcement learning strategy is proposed for training a policy for online trajectory planning without establishing the dynamic and kinematic models of the space robot. The model-free learning algorithm learns a policy that maps states to actions via trial and error in a simulation environment. With the learned policy, which is represented by a feedforward neural network with 2 hidden layers, the space robot can schedule and perform actions quickly and can be implemented for real-time applications. The feasibility of the trained policy is demonstrated for both fixed and moving targets.
ISSN:	1270-9638 1626-3219
DOI:	10.1016/j.ast.2019.105657