Action-Selection Method for Reinforcement Learning Based on Cuckoo Search Algorithm

A fundamental challenge in reinforcement learning is how to balance between exploration and exploitation of actions. A balanced ratio of exploration/exploitation can significantly affect the total learning time and the quality of learned policies. Thus, several sophisticated action-selection methods...

Full description

Saved in:
Bibliographic Details
Published inArabian journal for science and engineering (2011) Vol. 43; no. 12; pp. 6771 - 6785
Main Author Abed-alguni, Bilal H.
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.12.2018
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A fundamental challenge in reinforcement learning is how to balance between exploration and exploitation of actions. A balanced ratio of exploration/exploitation can significantly affect the total learning time and the quality of learned policies. Thus, several sophisticated action-selection methods have been proposed to achieve a balance between exploration and exploitation of actions. However, current action-selection methods either require fine-tuning for their exploration parameters (e.g., undirected methods) or require huge computations to converge to optimality (e.g., directed methods). This paper proposes a new action-selection method called cuckoo action-selection (CAS) method that is based on the cuckoo search algorithm. The cuckoo search algorithm is a powerful optimization algorithm that increases the possibility of finding the optimal actions by balancing between the exploration and exploitation of actions using one tuning parameter. An advantage of the cuckoo search algorithm is that its performance is not highly sensitive to its tuning parameter. The performance of the CAS method was empirically compared with two widely used action-selection methods ( ε -greedy and softmax) using three problems: 10-armed bandit, cliff-walking and taxi-domain problems. The experimental results suggest that CAS outperforms the ε -greedy and softmax action-selection methods. In addition, the results suggest that the performance of CAS is not highly sensitive to its parameter settings.
ISSN:2193-567X
1319-8025
2191-4281
DOI:10.1007/s13369-017-2873-8