A Communication Efficient ADMM-based Distributed Algorithm Using Two-Dimensional Torus Grouping AllReduce
Large-scale distributed training mainly consists of sub-model parallel training and parameter synchronization. With the expansion of training workers, the efficiency of parameter synchronization will be affected. To tackle this problem, we first propose 2D-TGA, a g rouping A llReduce method based on...
Saved in:
Published in | Data science and engineering Vol. 8; no. 1; pp. 61 - 72 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Singapore
Springer Nature Singapore
01.03.2023
Springer Nature B.V SpringerOpen |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Large-scale distributed training mainly consists of sub-model parallel training and parameter synchronization. With the expansion of training workers, the efficiency of parameter synchronization will be affected. To tackle this problem, we first propose 2D-TGA, a
g
rouping
A
llReduce method based on the two-dimensional
t
orus topology. This method synchronizes the model parameters by grouping and makes full use of bandwidth. Secondly, we propose a distributed algorithm, 2D-TGA-ADMM, which combines the 2D-TGA with the alternating direction method of multipliers (ADMM). It focuses on sub-model training and reduces the wait time among workers in the synchronization process. Finally, experimental results on the Tianhe-2 supercomputing platform show that compared with the
MPI
_
Allreduce
, the 2D-TGA could shorten the synchronization wait time by
33
%
. |
---|---|
ISSN: | 2364-1185 2364-1541 |
DOI: | 10.1007/s41019-022-00202-7 |