Heuristic algorithms based on deep reinforcement learning for quadratic unconstrained binary optimization

The unconstrained binary quadratic programming (UBQP) problem is a difficult combinatorial optimization problem that has been intensively studied in the past decades. Due to its NP-hardness, many heuristic algorithms have been developed for the solution of the UBQP. These algorithms are usually prob...

Full description

Saved in:
Bibliographic Details
Published inKnowledge-based systems Vol. 207; p. 106366
Main Authors Chen, Ming, Chen, Yuning, Du, Yonghao, Wei, Luona, Chen, Yingwu
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 05.11.2020
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The unconstrained binary quadratic programming (UBQP) problem is a difficult combinatorial optimization problem that has been intensively studied in the past decades. Due to its NP-hardness, many heuristic algorithms have been developed for the solution of the UBQP. These algorithms are usually problem-tailored, which lack generality and scalability. To address these issues, a heuristic algorithm based on deep reinforcement learning (DRLH) is proposed in this paper. It features in inputting specific features and using a neural network model called NN to guild the selection of variable at each solution construction step. Also, to improve the algorithm speed and efficiency, two algorithm variants named simplified DRLH (DRLS) and DRLS with hill climbing (DRLS-HC) are developed as well. These three algorithms are examined through extensive experiments in comparison with famous heuristic algorithms from the literature. Experimental results show that the DRLH, DRLS, and DRLS-HC outperform their competitors in terms of both solution quality and computational efficiency. Precisely, the DRLH achieves the best-quality results, while DRLS offers a high-quality solution in a very short time. By adding a hill-climbing procedure to DRLS, the resulting DRLS-HC algorithm is able to obtain almost the same quality result as DRLH with however 5 times less computing time on average. We conducted additional experiments on large-scale instances and various data distributions to verify the generality and scalability of the proposed algorithms, and the results on benchmark instances indicate the ability of the algorithms to be applied to practical problems.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2020.106366