Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem

In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Consideri...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 18; no. 3; p. e0283207
Main Authors	Zhang, Jin, Liu, Qing, Han, XiaoHang
Format	Journal Article
Language	English
Published	United States Public Library of Science 21.03.2023 Public Library of Science (PLoS)
Subjects	Algorithms Analysis Biology and Life Sciences Computer and Information Sciences Data mining Decision making Deep learning Design Earth Sciences Genetic algorithms Health aspects Heuristic Integer programming Learning Machine learning Management science Mathematical optimization Methods Mutation Neural networks Optimization Parameter estimation Physical Sciences Reinforcement Reinforcement, Psychology Research and Analysis Methods Reward Sales personnel Search algorithms Search methods Social aspects Social Sciences Travel Traveling salesman problem Weighting functions China
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0283207