Sample-Efficient Deep Reinforcement Learning of Mobile Manipulation for 6-DOF Trajectory Following

The whole-body control of mobile manipulators for the 6-DOF trajectory following task is the basis of many continuous tasks. However, traditional control strategies rely on accurate models and expert knowledge for solving the trajectory following task. Deep reinforcement learning (DRL) provides a pr...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on automation science and engineering Vol. 22; pp. 11381 - 11391
Main Authors Zhou, Yifan, Feng, Qiyu, Zhou, Yixuan, Lin, Jianghao, Liu, Zhe, Wang, Hesheng
Format Journal Article
LanguageEnglish
Published IEEE 2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The whole-body control of mobile manipulators for the 6-DOF trajectory following task is the basis of many continuous tasks. However, traditional control strategies rely on accurate models and expert knowledge for solving the trajectory following task. Deep reinforcement learning (DRL) provides a promising model-free solution, but it is sample-inefficient. To this end, we propose Trajectory Following Hindsight Experience Replay (TF-HER), a sample-efficient DRL algorithm for the whole-body coupled trajectory following task with dense rewards. TF-HER builds a multi-trajectory state space, and relabels the low-reward data to generate informative high-reward experiences. Also, the distributional shift caused by the relabeling is corrected by estimating the density ratio of relabeled experiences. Extensive demonstrations on both nonholonomic and holonomic bases in simulation validate that our algorithm can accelerate the model convergence and significantly improve the sample efficiency. Furthermore, we present real-world experiments to demonstrate the effectiveness of our approach. The code is available: https://github.com/IRMV-Manipulation-Group/TF-HER .Note to Practitioners-The whole-body 6-DOF trajectory tracking capability for mobile manipulators is crucial in industrial automation, serving as the foundation for a wide range of continuous and precise operations including automated assembly, welding, and material handling. This paper proposes a reinforcement learning approach to enhance the efficiency and effectiveness of mobile manipulators, requiring no prior model information. Beyond this specific application, this research has promising applications in other robotic systems because it is model-free. By utilizing a gradient-descent-based relabeling method and an adaptive density ratio estimator, we address distributional shifts and mitigate hindsight bias, guiding the mobile manipulator to accurately follow complex 6-DOF trajectories with dense rewards. The experimental results highlight our method's superior performance over existing model-free techniques, with improved sample efficiency and reduced tracking error across diverse robotic platforms in both simulated and real-world settings. The successful policy transfer from simulation to physical robots with fine-tuning further validates the robustness and practical applicability.
ISSN:1545-5955
1558-3783
DOI:10.1109/TASE.2025.3530162