Constrained Expectation-Maximization Methods for Effective Reinforcement Learning

Recent advancement on reinforcement learning (RL) algorithms shows that effective learning of parametric action- selection policies can often be achieved through direct opti- mization of a performance lower bound subject to pre-defined policy behavioral constraints. Driven by this understanding, thi...

Full description

Saved in:
Bibliographic Details
Published in2018 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8
Main Authors Chen, Gang, Peng, Yiming, Zhang, Mengjie
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recent advancement on reinforcement learning (RL) algorithms shows that effective learning of parametric action- selection policies can often be achieved through direct opti- mization of a performance lower bound subject to pre-defined policy behavioral constraints. Driven by this understanding, this paper seeks to develop new policy search techniques where RL is achieved through maximizing a performance lower bound ob- tained originally based on an Expectation-Maximization method. For reliable RL, our new learning techniques must also simul- taneously guarantee constrained policy behavioral changes mea- sured through KL divergence. Two separate approaches will be pursued to tackle our constrained policy optimization problems, resulting in two new RL algorithms. The first algorithm utilizes a conjugate gradient technique and a Bayesian learning method for approximate optimization. The second algorithm focuses on minimizing a loss function derived from solving the Lagrangian for constrained policy search. Both algorithms have been experi- mentally examined on several benchmark problems provided by OpenAI GYM. The experiment results clearly demonstrate that our algorithms can be highly effective in comparison to several well-known RL algorithms.
ISSN:2161-4407
DOI:10.1109/IJCNN.2018.8488990