Constrained Expectation-Maximization Methods for Effective Reinforcement Learning

Recent advancement on reinforcement learning (RL) algorithms shows that effective learning of parametric action- selection policies can often be achieved through direct opti- mization of a performance lower bound subject to pre-defined policy behavioral constraints. Driven by this understanding, thi...

Full description

Saved in:

Bibliographic Details
Published in	2018 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8
Main Authors	Chen, Gang, Peng, Yiming, Zhang, Mengjie
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2018
Subjects	Approximation algorithms Bayes methods Estimation Learning (artificial intelligence) Optimization Reliability Trajectory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent advancement on reinforcement learning (RL) algorithms shows that effective learning of parametric action- selection policies can often be achieved through direct opti- mization of a performance lower bound subject to pre-defined policy behavioral constraints. Driven by this understanding, this paper seeks to develop new policy search techniques where RL is achieved through maximizing a performance lower bound ob- tained originally based on an Expectation-Maximization method. For reliable RL, our new learning techniques must also simul- taneously guarantee constrained policy behavioral changes mea- sured through KL divergence. Two separate approaches will be pursued to tackle our constrained policy optimization problems, resulting in two new RL algorithms. The first algorithm utilizes a conjugate gradient technique and a Bayesian learning method for approximate optimization. The second algorithm focuses on minimizing a loss function derived from solving the Lagrangian for constrained policy search. Both algorithms have been experi- mentally examined on several benchmark problems provided by OpenAI GYM. The experiment results clearly demonstrate that our algorithms can be highly effective in comparison to several well-known RL algorithms.
ISSN:	2161-4407
DOI:	10.1109/IJCNN.2018.8488990