Large-scale multi-machine multi-card pre-training method, system and device and server cluster
The invention belongs to the technical field of distributed training, and discloses a large-scale multi-machine multi-card pre-training method, system and device and a server cluster. Multiple machines and multiple cards are deployed on multiple servers, and multi-machine and multi-card parallelism...
Saved in:
Main Authors | , , , , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
30.11.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The invention belongs to the technical field of distributed training, and discloses a large-scale multi-machine multi-card pre-training method, system and device and a server cluster. Multiple machines and multiple cards are deployed on multiple servers, and multi-machine and multi-card parallelism of an isomorphic machine type and a heterogeneous mixed machine type is carried out; large-scale multi-machine multi-card training and evaluation are carried out based on a slm framework, and implementation is carried out by taking an unsupervised feature learning BYOL algorithm as an example; large-scale multi-machine multi-card training and evaluation are performed based on a Horovod framework, and implementation is performed by using a video semantic unsupervised learning PRP algorithm; and the training comprises environment configuration, task configuration, communication configuration and task acceleration. According to the multi-machine multi-card large-scale training experiment, the batchsize is high, the tr |
---|---|
Bibliography: | Application Number: CN202111042840 |