A Communication Efficient ADMM-based Distributed Algorithm Using Two-Dimensional Torus Grouping AllReduce

Large-scale distributed training mainly consists of sub-model parallel training and parameter synchronization. With the expansion of training workers, the efficiency of parameter synchronization will be affected. To tackle this problem, we first propose 2D-TGA, a g rouping A llReduce method based on...

Full description

Saved in:
Bibliographic Details
Published inData science and engineering Vol. 8; no. 1; pp. 61 - 72
Main Authors Wang, Guozheng, Lei, Yongmei, Zhang, Zeyu, Peng, Cunlu
Format Journal Article
LanguageEnglish
Published Singapore Springer Nature Singapore 01.03.2023
Springer Nature B.V
SpringerOpen
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Large-scale distributed training mainly consists of sub-model parallel training and parameter synchronization. With the expansion of training workers, the efficiency of parameter synchronization will be affected. To tackle this problem, we first propose 2D-TGA, a g rouping A llReduce method based on the two-dimensional t orus topology. This method synchronizes the model parameters by grouping and makes full use of bandwidth. Secondly, we propose a distributed algorithm, 2D-TGA-ADMM, which combines the 2D-TGA with the alternating direction method of multipliers (ADMM). It focuses on sub-model training and reduces the wait time among workers in the synchronization process. Finally, experimental results on the Tianhe-2 supercomputing platform show that compared with the MPI _ Allreduce , the 2D-TGA could shorten the synchronization wait time by 33 % .
ISSN:2364-1185
2364-1541
DOI:10.1007/s41019-022-00202-7