Large-scale multi-machine multi-card pre-training method, system and device and server cluster

The invention belongs to the technical field of distributed training, and discloses a large-scale multi-machine multi-card pre-training method, system and device and a server cluster. Multiple machines and multiple cards are deployed on multiple servers, and multi-machine and multi-card parallelism...

Full description

Saved in:
Bibliographic Details
Main Authors LI GE, WANG YAOWEI, GUO MINGYUE, BAI XINBEI, REN YURUI
Format Patent
LanguageChinese
English
Published 30.11.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention belongs to the technical field of distributed training, and discloses a large-scale multi-machine multi-card pre-training method, system and device and a server cluster. Multiple machines and multiple cards are deployed on multiple servers, and multi-machine and multi-card parallelism of an isomorphic machine type and a heterogeneous mixed machine type is carried out; large-scale multi-machine multi-card training and evaluation are carried out based on a slm framework, and implementation is carried out by taking an unsupervised feature learning BYOL algorithm as an example; large-scale multi-machine multi-card training and evaluation are performed based on a Horovod framework, and implementation is performed by using a video semantic unsupervised learning PRP algorithm; and the training comprises environment configuration, task configuration, communication configuration and task acceleration. According to the multi-machine multi-card large-scale training experiment, the batchsize is high, the tr
Bibliography:Application Number: CN202111042840