Application of Deep Reinforcement Learning in Guandan Game

In recent years, imperfect information game has become an important touchstone to test the level of artificial intelligence. There are many imperfect information game scenarios in the real-world, such as economic transactions, military games, automatic driving. Therefore, the study of imperfect info...

Full description

Saved in:

Bibliographic Details
Published in	2022 34th Chinese Control and Decision Conference (CCDC) pp. 3499 - 3504
Main Authors	Pan, Jiahong, Zhang, Zhongtian, Shen, Hengheng, Zeng, Yi, Wu, Lei
Format	Conference Proceeding
Language	English
Published	IEEE 15.08.2022
Subjects	Artificial intelligence Deep learning Deep Reinforcement Learning Economics Games Guandan Imperfect Information Game Optimization Proximal Policy Optimization Algorithm Reinforcement learning Self-Learning Video games
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, imperfect information game has become an important touchstone to test the level of artificial intelligence. There are many imperfect information game scenarios in the real-world, such as economic transactions, military games, automatic driving. Therefore, the study of imperfect information game problems has very important practical significance. Guandan is a type of imperfect information card game with four players which are divided into two teams. The mass hidden information in the Guandan game leads to a high-dimensional game state. Reinforcement learning algorithm has efficient ability in strategy search of computer games. But it cannot converge under the condition of imperfect information and high-dimensional state space which caused by Guandan Game. According to these problems, this paper introduces the Proximal Policy Optimization (PPO) algorithm based on deep reinforcement learning to solve the problem of imperfect information, high-dimensional state space, and action space. It enables the agent to perceive high-dimensional information and makes decisions according to the acquisition information. The experiment result shows that the decision model based on the Proximal Policy Optimization algorithm is better than the intelligence level of the Policy Gradient algorithm and A2C algorithm, which proves that the system has a self-learning, ability to improve the game level of Guandan.
ISSN:	1948-9447
DOI:	10.1109/CCDC55256.2022.10033565