SK-Gradient: Efficient Communication for Distributed Machine Learning with Data Sketch

With the explosive growth of data volume, distributed machine learning has become the mainstream approach for training deep neural networks. However, distributed machine learning incurs non-trivial communication overhead. To this end, various compression schemes are proposed to alleviate the communi...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 39th International Conference on Data Engineering (ICDE) pp. 2372 - 2385
Main Authors Gui, Jie, Song, Yuchen, Wang, Zezhou, He, Chenhong, Huang, Qun
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With the explosive growth of data volume, distributed machine learning has become the mainstream approach for training deep neural networks. However, distributed machine learning incurs non-trivial communication overhead. To this end, various compression schemes are proposed to alleviate the communication volume among nodes. Nevertheless, existing compression schemes, such as gradient quantization or gradient sparsification, suffer from low compression ratios and/or high computational overheads. Recent studies advocate leveraging sketch techniques to assist these schemes. However, the limitations of gradient quantization and gradient sparsification remain. In this paper, we propose SK-Gradient, a novel gradient compression scheme that solely builds on sketch. The core component of SK-Gradient is a novel sketch namely FGC Sketch that is tailored for gradient compression. FGC Sketch precomputes the costly hash functions to alleviate computational overheads. Its simplified design makes it convenient for GPU acceleration. In addition, SK-Gradient leverages various techniques including selective gradient compression and periodic synchronization strategy to improve computational efficiency and compression accuracy. Compared with the state-of-the-art schemes, SK-Gradient achieves up to 92.9% reduction in computational overhead and up to 95.2% improvement in training speedups at the same compression ratio.
ISSN:2375-026X
DOI:10.1109/ICDE55515.2023.00183