Constrained Expectation-Maximization Methods for Effective Reinforcement Learning
Recent advancement on reinforcement learning (RL) algorithms shows that effective learning of parametric action- selection policies can often be achieved through direct opti- mization of a performance lower bound subject to pre-defined policy behavioral constraints. Driven by this understanding, thi...
Saved in:
Published in | 2018 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.07.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recent advancement on reinforcement learning (RL) algorithms shows that effective learning of parametric action- selection policies can often be achieved through direct opti- mization of a performance lower bound subject to pre-defined policy behavioral constraints. Driven by this understanding, this paper seeks to develop new policy search techniques where RL is achieved through maximizing a performance lower bound ob- tained originally based on an Expectation-Maximization method. For reliable RL, our new learning techniques must also simul- taneously guarantee constrained policy behavioral changes mea- sured through KL divergence. Two separate approaches will be pursued to tackle our constrained policy optimization problems, resulting in two new RL algorithms. The first algorithm utilizes a conjugate gradient technique and a Bayesian learning method for approximate optimization. The second algorithm focuses on minimizing a loss function derived from solving the Lagrangian for constrained policy search. Both algorithms have been experi- mentally examined on several benchmark problems provided by OpenAI GYM. The experiment results clearly demonstrate that our algorithms can be highly effective in comparison to several well-known RL algorithms. |
---|---|
ISSN: | 2161-4407 |
DOI: | 10.1109/IJCNN.2018.8488990 |