A kernel based true online Sarsa(λ) for continuous space control problems
Reinforcement learning is an efficient learning method for the control problem by interacting with the environment to get an optimal policy. However, it al so faces challenges such as low convergence accuracy and slow convergence. More over, conventional reinforcement learning algorithms could hardl...
Saved in:
Published in | Computer Science and Information Systems Vol. 14; no. 3; pp. 789 - 804 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
01.09.2017
|
Online Access | Get full text |
Cover
Loading…
Summary: | Reinforcement learning is an efficient learning method for the control
problem by interacting with the environment to get an optimal policy.
However, it al so faces challenges such as low convergence accuracy and slow
convergence. More over, conventional reinforcement learning algorithms could
hardly solve continuous control problems. The kernel-based method can
accelerate convergence speed and improve convergence accuracy; and the
policy gradient method is a good way to deal with continuous space
problems. We proposed a Sarsa(?) version of true online time difference
algorithm, named True Online Sarsa(?)(TOSarsa(?)), on the basis of the
clustering-based sample specification method and selective kernel-based
value function. The TOSarsa(?) algorithm has a consistent result with both
the forward view and the backward view which ensures to get an optimal
policy in less time. Afterwards we also combined TOSarsa(?) with heuristic
dynamic programming. The experiments showed our proposed algorithm worked
well in dealing with continuous control problem.
nema |
---|---|
ISSN: | 1820-0214 2406-1018 |
DOI: | 10.2298/CSIS170107029Z |