D2SR: Transferring Dense Reward Function to Sparse by Network Resetting

In Reinforcement Learning (RL), most algorithms use a fixed reward function, and few studies discuss transferring the reward function during learning. Actually, different types of reward functions have different characteristics. In general, a shaped dense reward function has the advantage of quickly...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Conference on Real-time Computing and Robotics (RCAR) pp. 906 - 911
Main Authors Luo, Yongle, Wang, Yuxin, Dong, Kun, Liu, Yu, Sun, Zhiyong, Zhang, Qiang, Song, Bo
Format Conference Proceeding
LanguageEnglish
Published IEEE 17.07.2023
Subjects
Online AccessGet full text
DOI10.1109/RCAR58764.2023.10249999

Cover

More Information
Summary:In Reinforcement Learning (RL), most algorithms use a fixed reward function, and few studies discuss transferring the reward function during learning. Actually, different types of reward functions have different characteristics. In general, a shaped dense reward function has the advantage of quickly guiding the agent to a high-value state but has the disadvantage of being difficult to design a well-shaped function and susceptible to noise. The sparse reward has the advantages of being robust and consistent with the task, but less efficient in early exploration. Therefore, this paper proposes an algorithm called Dense2Sparse by Network Resetting (D2SR), which simultaneously satisfies the efficiency of dense reward functions and the robustness of sparse rewards. Specifically, the D2SR method can rescue the agent from being misled by suboptimal dense rewards by network resetting parameters and transferring experience to sparse rewards, thereby achieving significant improvements in the direction of the global optimum. In this study, through a series of ablation experiments on challenging robot manipulation tasks, we find that D2SR can reduce the requirement of dense reward function design, which can also balance efficiency and performance in tasks with noisy rewards.
DOI:10.1109/RCAR58764.2023.10249999