AdapCC: Making Collective Communication in Distributed Machine Learning Adaptive
As deep learning (DL) models continue to grow in size, there is a pressing need for distributed model learning using a large number of devices (e.g., G PU s) and servers. Collective communication among devices/servers (for gradient synchronization, intermediate data exchange, etc.) introduces signif...
Saved in:
Published in | Proceedings of the International Conference on Distributed Computing Systems pp. 25 - 35 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
23.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!