A model-based reinforcement learning method based on conditional generative adversarial networks

•A conditional generative adversarial networks (CGAN) based model-based RL is proposed.•The CGAN based model learning method can generate sufficient samples for policy learning.•The CGAN based model learning method does not require explicit expression of transition model.•The performance is improved...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition letters Vol. 152; pp. 18 - 25
Main Authors Zhao, Tingting, Wang, Ying, Li, Guixi, Kong, Le, Chen, Yarui, Wang, Yuan, Xie, Ning, Yang, Jucheng
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 01.12.2021
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A conditional generative adversarial networks (CGAN) based model-based RL is proposed.•The CGAN based model learning method can generate sufficient samples for policy learning.•The CGAN based model learning method does not require explicit expression of transition model.•The performance is improved and the sample efficiency is guaranteed. Deep reinforcement learning (DRL) integrates the advantages of the perception of deep learning and enables reinforcement learning scale to problems with high dimensional state and action spaces that were previously intractable. The success of DRL primarily relies on the high level representation ability of deep learning. To obtain a good performed representation model, excessive training samples and training time are necessary. However, collecting a large number of samples in real world is extremely expensive and time consuming. To mitigate the sample inefficiency problem, we propose a novel model-based reinforcement learning method by combining conditional generative adversarial networks (CGAN-MbRL) with the state-of-the-art policy learning method. The proposed CGAN-MbRL can directly deal with the high dimensional state, and mitigate the problem of sample inefficiency to some extent. Finally, the effectiveness of the proposed method is demonstrated through the illustrative data and the RL benchmark.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2021.08.019