TEMPPO: Twin Entropy Maximized Proximal Policy Optimization

In recent years the Deep Reinforcement learning has been progressing so fast and more and more complicated reinforcement learning environments have been examined with new deep reinforcement algorithms. The game of football is a complicated game with a sparse reward environment. ActorCritic methods a...

Full description

Saved in:
Bibliographic Details
Published in2022 27th International Computer Conference, Computer Society of Iran (CSICC) pp. 1 - 6
Main Authors Shahrokhi, S.A., Ahmadi, Ali
Format Conference Proceeding
LanguageEnglish
Published IEEE 23.02.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In recent years the Deep Reinforcement learning has been progressing so fast and more and more complicated reinforcement learning environments have been examined with new deep reinforcement algorithms. The game of football is a complicated game with a sparse reward environment. ActorCritic methods are gaining more popularity, but these methods have two important challenges. One challenge is the overestimation error and another challenge is how to explore more effectively in the big environments with sparse rewards. As to tackle the overestimation error, this paper proposes to use twin critics and as for the exploration, this paper proposes a new way to use the entropy in the objective function and cost function. This new method name is TEMPPO and it is based on the Proximal Policy optimization algorithm. In this paper, the results of the TEMPPO are tested on the Google Research Football environment.
DOI:10.1109/CSICC55295.2022.9780488