D2SR: Transferring Dense Reward Function to Sparse by Network Resetting
In Reinforcement Learning (RL), most algorithms use a fixed reward function, and few studies discuss transferring the reward function during learning. Actually, different types of reward functions have different characteristics. In general, a shaped dense reward function has the advantage of quickly...
Saved in:
Published in | 2023 IEEE International Conference on Real-time Computing and Robotics (RCAR) pp. 906 - 911 |
---|---|
Main Authors | , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
17.07.2023
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/RCAR58764.2023.10249999 |
Cover
Summary: | In Reinforcement Learning (RL), most algorithms use a fixed reward function, and few studies discuss transferring the reward function during learning. Actually, different types of reward functions have different characteristics. In general, a shaped dense reward function has the advantage of quickly guiding the agent to a high-value state but has the disadvantage of being difficult to design a well-shaped function and susceptible to noise. The sparse reward has the advantages of being robust and consistent with the task, but less efficient in early exploration. Therefore, this paper proposes an algorithm called Dense2Sparse by Network Resetting (D2SR), which simultaneously satisfies the efficiency of dense reward functions and the robustness of sparse rewards. Specifically, the D2SR method can rescue the agent from being misled by suboptimal dense rewards by network resetting parameters and transferring experience to sparse rewards, thereby achieving significant improvements in the direction of the global optimum. In this study, through a series of ablation experiments on challenging robot manipulation tasks, we find that D2SR can reduce the requirement of dense reward function design, which can also balance efficiency and performance in tasks with noisy rewards. |
---|---|
DOI: | 10.1109/RCAR58764.2023.10249999 |