Heuristic algorithms based on deep reinforcement learning for quadratic unconstrained binary optimization

The unconstrained binary quadratic programming (UBQP) problem is a difficult combinatorial optimization problem that has been intensively studied in the past decades. Due to its NP-hardness, many heuristic algorithms have been developed for the solution of the UBQP. These algorithms are usually prob...

Full description

Saved in:

Bibliographic Details
Published in	Knowledge-based systems Vol. 207; p. 106366
Main Authors	Chen, Ming, Chen, Yuning, Du, Yonghao, Wei, Luona, Chen, Yingwu
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 05.11.2020 Elsevier Science Ltd
Subjects	Algorithms Combinatorial analysis Computational efficiency Computing time Deep learning Deep reinforcement learning Heuristic Heuristic algorithm Heuristic methods Machine learning Neural network Neural networks Optimization Quadratic programming Unconstrained binary quadratic programming Unconstrained binary quadratic programming Deep reinforcement learning Neural network Heuristic algorithm
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The unconstrained binary quadratic programming (UBQP) problem is a difficult combinatorial optimization problem that has been intensively studied in the past decades. Due to its NP-hardness, many heuristic algorithms have been developed for the solution of the UBQP. These algorithms are usually problem-tailored, which lack generality and scalability. To address these issues, a heuristic algorithm based on deep reinforcement learning (DRLH) is proposed in this paper. It features in inputting specific features and using a neural network model called NN to guild the selection of variable at each solution construction step. Also, to improve the algorithm speed and efficiency, two algorithm variants named simplified DRLH (DRLS) and DRLS with hill climbing (DRLS-HC) are developed as well. These three algorithms are examined through extensive experiments in comparison with famous heuristic algorithms from the literature. Experimental results show that the DRLH, DRLS, and DRLS-HC outperform their competitors in terms of both solution quality and computational efficiency. Precisely, the DRLH achieves the best-quality results, while DRLS offers a high-quality solution in a very short time. By adding a hill-climbing procedure to DRLS, the resulting DRLS-HC algorithm is able to obtain almost the same quality result as DRLH with however 5 times less computing time on average. We conducted additional experiments on large-scale instances and various data distributions to verify the generality and scalability of the proposed algorithms, and the results on benchmark instances indicate the ability of the algorithms to be applied to practical problems.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2020.106366