Policy Sharing Using Aggregation Trees for -Learning in a Continuous State and Action Spaces

Q-learning is a generic approach that uses a finite discrete state and an action domain to estimate action values using tabular or function approximation methods. An intelligent agent eventually learns policies from continuous sensory inputs and encodes these environmental inputs onto a discrete sta...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on cognitive and developmental systems Vol. 12; no. 3; pp. 474 - 485
Main Authors	Chen, Yu-Jen, Jiang, Wei-Cheng, Ju, Ming-Yi, Hwang, Kao-Shing
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.09.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">Q -learning Approximation Architecture Continuity (mathematics) Discretization Domains Estimation error Intelligent agents Learning Multi-agent systems Multiagent system Multiagent systems policy sharing Reinforcement learning Tree data structures tree structure Trees
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Q-learning is a generic approach that uses a finite discrete state and an action domain to estimate action values using tabular or function approximation methods. An intelligent agent eventually learns policies from continuous sensory inputs and encodes these environmental inputs onto a discrete state space. The application of Q-learning in a continuous state/action domain is the subject of many studies. This paper uses a tree structure to approximate a Q-function using in a continuous state domain. The agent selects a discretized action with a maximum Q-value and this discretized action is then extended to a continuous action using an action bias function. Reinforcement learning is difficult for a single agent when the state space is huge. This proposed architecture is also applied to a multiagent system, wherein an individual agent transfers its useful Q-values to other agents to accelerate the learning process. Policy is shared between agents by grafting the branches of trees in which Q-values are stored to other trees. The results for simulation show that the proposed architecture performs better than tabular Q-learning and significantly accelerates the learning process because all agents use the sharing mechanisms to cooperate with each other.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2379-8920 2379-8939
DOI:	10.1109/TCDS.2019.2926477