A Reinforcement Learning Hyper-Heuristic with Cumulative Rewards for Dual-Peak Time-Varying Network Optimization in Heterogeneous Multi-Trip Vehicle Routing

Urban logistics face complexity due to traffic congestion, fleet heterogeneity, warehouse constraints, and driver workload balancing, especially in the Heterogeneous Multi-Trip Vehicle Routing Problem with Time Windows and Time-Varying Networks (HMTVRPTW-TVN). We develop a mixed-integer linear progr...

Full description

Saved in:

Bibliographic Details
Published in	Algorithms Vol. 18; no. 9; p. 536
Main Authors	Wang, Xiaochuan, Li, Na, Jin, Xingchen
Format	Journal Article
Language	English
Published	22.08.2025
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Urban logistics face complexity due to traffic congestion, fleet heterogeneity, warehouse constraints, and driver workload balancing, especially in the Heterogeneous Multi-Trip Vehicle Routing Problem with Time Windows and Time-Varying Networks (HMTVRPTW-TVN). We develop a mixed-integer linear programming (MILP) model with dual-peak time discretization and exact linearization for heterogeneous fleet coordination. Given the NP-hard nature, we propose a Hyper-Heuristic based on Cumulative Reward Q-Learning (HHCRQL), integrating reinforcement learning with heuristic operators in a Markov Decision Process (MDP). The algorithm dynamically selects operators using a four-dimensional state space and a cumulative reward function combining timestep and fitness. Experiments show that, for small instances, HHCRQL achieves solutions within 3% of Gurobi’s optimum when customer nodes exceed 15, outperforming Large Neighborhood Search (LNS) and LNS with Simulated Annealing (LNSSA) with stable, shorter runtime. For large-scale instances, HHCRQL reduces gaps by up to 9.17% versus Iterated Local Search (ILS), 6.74% versus LNS, and 5.95% versus LNSSA, while maintaining relatively stable runtime. Real-world validation using Shanghai logistics data reduces waiting times by 35.36% and total transportation times by 24.68%, confirming HHCRQL’s effectiveness, robustness, and scalability.
ISSN:	1999-4893 1999-4893
DOI:	10.3390/a18090536