Reinforcement Learning for POMDP Environments Using State Representation with Reservoir Computing

One of the challenges in reinforcement learning is regarding the partially observable Markov decision process (POMDP). In this case, an agent cannot observe the true state of the environment and perceive different states to be the same. Our proposed method uses the agent’s time-series information to...

Full description

Saved in:

Bibliographic Details
Published in	Journal of advanced computational intelligence and intelligent informatics Vol. 26; no. 4; pp. 562 - 569
Main Authors	Yamashita, Kodai, Hamagami, Tomoki
Format	Journal Article
Language	English
Published	Tokyo Fuji Technology Press Co. Ltd 20.07.2022
Subjects	Automation Learning Markov analysis Markov processes Neural networks Perception Teaching methods
Online Access	Get full text

Cover

Loading…

More Information
Summary:	One of the challenges in reinforcement learning is regarding the partially observable Markov decision process (POMDP). In this case, an agent cannot observe the true state of the environment and perceive different states to be the same. Our proposed method uses the agent’s time-series information to deal with this imperfect perception problem. In particular, the proposed method uses reservoir computing to transform the time-series of observation information into a non-linear state. A typical model of reservoir computing, the echo state network (ESN), transforms raw observations into reservoir states. The proposed method is named dual ESNs reinforcement learning, which uses two ESNs specialized for observation and action information. The experimental results show the effectiveness of the proposed method in environments where imperfect perception problems occur.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1343-0130 1883-8014
DOI:	10.20965/jaciii.2022.p0562