A model-based reinforcement learning method based on conditional generative adversarial networks

•A conditional generative adversarial networks (CGAN) based model-based RL is proposed.•The CGAN based model learning method can generate sufficient samples for policy learning.•The CGAN based model learning method does not require explicit expression of transition model.•The performance is improved...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition letters Vol. 152; pp. 18 - 25
Main Authors	Zhao, Tingting, Wang, Ying, Li, Guixi, Kong, Le, Chen, Yarui, Wang, Yuan, Xie, Ning, Yang, Jucheng
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.12.2021 Elsevier Science Ltd
Subjects	Deep learning Generative adversarial networks Reinforcement Representations Teaching methods Training Policy learning 41A10 65D05 65D17 Generative adversarial networks Model-based reinforcement learning Transition model learning 41A05
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•A conditional generative adversarial networks (CGAN) based model-based RL is proposed.•The CGAN based model learning method can generate sufficient samples for policy learning.•The CGAN based model learning method does not require explicit expression of transition model.•The performance is improved and the sample efficiency is guaranteed. Deep reinforcement learning (DRL) integrates the advantages of the perception of deep learning and enables reinforcement learning scale to problems with high dimensional state and action spaces that were previously intractable. The success of DRL primarily relies on the high level representation ability of deep learning. To obtain a good performed representation model, excessive training samples and training time are necessary. However, collecting a large number of samples in real world is extremely expensive and time consuming. To mitigate the sample inefficiency problem, we propose a novel model-based reinforcement learning method by combining conditional generative adversarial networks (CGAN-MbRL) with the state-of-the-art policy learning method. The proposed CGAN-MbRL can directly deal with the high dimensional state, and mitigate the problem of sample inefficiency to some extent. Finally, the effectiveness of the proposed method is demonstrated through the illustrative data and the RL benchmark.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2021.08.019