DPLRS: Distributed Population Learning Rate Schedule

Deep neural network models perform very brightly in the field of artificial intelligence, but their success is affected by hyperparameters, and the learning rate schedule is one of the most important hyperparameters, while the search for the learning rate schedule is often time-consuming and computa...

Full description

Saved in:

Bibliographic Details
Published in	Future generation computer systems Vol. 132; pp. 40 - 50
Main Authors	Wei, Jia, Zhang, Xingjun, Ji, Zeyu, Wei, Zheng, Li, Jingbo
Format	Journal Article
Language	English
Published	Elsevier B.V 01.07.2022
Subjects	Data parallel Deep learning Distributed training Hyperparameter search Population algorithm Deep learning Hyperparameter search Population algorithm Distributed training Data parallel
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep neural network models perform very brightly in the field of artificial intelligence, but their success is affected by hyperparameters, and the learning rate schedule is one of the most important hyperparameters, while the search for the learning rate schedule is often time-consuming and computationally resource-intensive. In this paper, we proposed Distributed Population Learning Rate Schedule (DPLRS) based on population joint optimization, which uses distributed data parallel deep neural network training to implement a dynamic learning rate schedule optimization strategy based on the population idea, with almost no loss of test accuracy. DPLRS is able to dynamically refine the learning rate schedule during model training instead of following the usual suboptimal strategy. We conducted experiments on typical AlexNet, VGG16, and ResNet18 using the Tianhe-3 supercomputing prototype. The results illustrate that using DPLRS to dynamically update the learning rate can greatly reduce the searching time of the learning rate schedule and meanwhile, can ensure the close performance with the latest population hyperparameter algorithm. Also, In our experiments, DPLRS lead to 123.85x speedup maximum, which prove the effectiveness and robustness of DPLRS.
ISSN:	0167-739X 1872-7115
DOI:	10.1016/j.future.2022.02.001