Adaptive internal state space construction method for reinforcement learning of a real-world agent

One of the difficulties encountered in the application of the reinforcement learning to real-world problems is the construction of a discrete state space from a continuous sensory input signal. In the absence of a priori knowledge about the task, a straightforward approach to this problem is to disc...

Full description

Saved in:

Bibliographic Details
Published in	Neural networks Vol. 12; no. 7; pp. 1143 - 1155
Main Authors	Samejima, K., Omori, T.
Format	Journal Article
Language	English
Published	United States Elsevier Ltd 01.10.1999
Subjects	Basis division Continuous observation space Internal state space Mobile robot Navigation task Reinforcement learning Temporal difference learning Continuous observation space Temporal difference learning Reinforcement learning Basis division Internal state space Mobile robot Navigation task
Online Access	Get full text

Cover

Loading…

More Information
Summary:	One of the difficulties encountered in the application of the reinforcement learning to real-world problems is the construction of a discrete state space from a continuous sensory input signal. In the absence of a priori knowledge about the task, a straightforward approach to this problem is to discretize the input space into a grid, and to use a lookup table. However, this method suffers from the curse of dimensionality. Some studies use continuous function approximators such as neural networks instead of lookup tables. However, when global basis functions such as sigmoid functions are used, convergence cannot be guaranteed. To overcome this problem, we propose a method in which local basis functions are incrementally assigned depending on the task requirement. Initially, only one basis function is allocated over the entire space. The basis function is divided according to the statistical property of locally weighted temporal difference error (TD error) of the value function. We applied this method to an autonomous robot collision avoidance problem, and evaluated the validity of the algorithm in simulation. The proposed algorithm, which we call adaptive basis division (ABD) algorithm, achieved the task using a smaller number of basis functions than the conventional methods. Moreover, we applied the method to a goal-directed navigation problem of a real mobile robot. The action strategy was learned using a database of sensor data, and it was then used for navigation of a real machine. The robot reached the goal using a smaller number of internal states than with the conventional methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1
ISSN:	0893-6080 1879-2782
DOI:	10.1016/S0893-6080(99)00055-6