Action-Selection Method for Reinforcement Learning Based on Cuckoo Search Algorithm

A fundamental challenge in reinforcement learning is how to balance between exploration and exploitation of actions. A balanced ratio of exploration/exploitation can significantly affect the total learning time and the quality of learned policies. Thus, several sophisticated action-selection methods...

Full description

Saved in:

Bibliographic Details
Published in	Arabian journal for science and engineering (2011) Vol. 43; no. 12; pp. 6771 - 6785
Main Author	Abed-alguni, Bilal H.
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.12.2018 Springer Nature B.V
Subjects	Algorithms Engineering Exploitation Exploration Humanities and Social Sciences Machine learning Methods multidisciplinary Optimization Parameter sensitivity Production scheduling Research Article - Computer Engineering and Computer Science Science Search algorithms Tuning Cuckoo search Markov decision process Q-learning Optimization Reinforcement learning Action-selection method
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A fundamental challenge in reinforcement learning is how to balance between exploration and exploitation of actions. A balanced ratio of exploration/exploitation can significantly affect the total learning time and the quality of learned policies. Thus, several sophisticated action-selection methods have been proposed to achieve a balance between exploration and exploitation of actions. However, current action-selection methods either require fine-tuning for their exploration parameters (e.g., undirected methods) or require huge computations to converge to optimality (e.g., directed methods). This paper proposes a new action-selection method called cuckoo action-selection (CAS) method that is based on the cuckoo search algorithm. The cuckoo search algorithm is a powerful optimization algorithm that increases the possibility of finding the optimal actions by balancing between the exploration and exploitation of actions using one tuning parameter. An advantage of the cuckoo search algorithm is that its performance is not highly sensitive to its tuning parameter. The performance of the CAS method was empirically compared with two widely used action-selection methods ( ε -greedy and softmax) using three problems: 10-armed bandit, cliff-walking and taxi-domain problems. The experimental results suggest that CAS outperforms the ε -greedy and softmax action-selection methods. In addition, the results suggest that the performance of CAS is not highly sensitive to its parameter settings.
ISSN:	2193-567X 1319-8025 2191-4281
DOI:	10.1007/s13369-017-2873-8