TEMPPO: Twin Entropy Maximized Proximal Policy Optimization

In recent years the Deep Reinforcement learning has been progressing so fast and more and more complicated reinforcement learning environments have been examined with new deep reinforcement algorithms. The game of football is a complicated game with a sparse reward environment. ActorCritic methods a...

Full description

Saved in:

Bibliographic Details
Published in	2022 27th International Computer Conference, Computer Society of Iran (CSICC) pp. 1 - 6
Main Authors	Shahrokhi, S.A., Ahmadi, Ali
Format	Conference Proceeding
Language	English
Published	IEEE 23.02.2022
Subjects	Actor-Critic Computer architecture Cost function Deep Reinforcement Learning Entropy Games Google Research Football Internet Linear programming PPO Reinforcement learning TEMPPO
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years the Deep Reinforcement learning has been progressing so fast and more and more complicated reinforcement learning environments have been examined with new deep reinforcement algorithms. The game of football is a complicated game with a sparse reward environment. ActorCritic methods are gaining more popularity, but these methods have two important challenges. One challenge is the overestimation error and another challenge is how to explore more effectively in the big environments with sparse rewards. As to tackle the overestimation error, this paper proposes to use twin critics and as for the exploration, this paper proposes a new way to use the entropy in the objective function and cost function. This new method name is TEMPPO and it is based on the Proximal Policy optimization algorithm. In this paper, the results of the TEMPPO are tested on the Google Research Football environment.
DOI:	10.1109/CSICC55295.2022.9780488