Visual large model distributed training method and system

The invention discloses a visual large model distributed training method and system, and the method comprises the steps: constructing a distributed training system which comprises a main control server, a plurality of GPU servers, a distributed storage server, and a storage network switch; determini...

Full description

Saved in:
Bibliographic Details
Main Authors LI GE, WANG YAOWEI, JI WEN, BAI XINBEI
Format Patent
LanguageChinese
English
Published 30.11.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention discloses a visual large model distributed training method and system, and the method comprises the steps: constructing a distributed training system which comprises a main control server, a plurality of GPU servers, a distributed storage server, and a storage network switch; determining a data loading mode according to the size of a data set used for visual large model training; according to structural characteristics of the visual large model, evaluating parameter quantities and calculation quantities of different types of network layer groups in the visual large model, decomposing the visual large model in combination with computing power and caching capability of a GPU, and determining a parallel training scheme of the visual large model; performing model training by adopting a hybrid parallel mode, performing model aggregation, global model updating and model distribution based on the master control server, evaluating the visual large model by utilizing a training set and a verification set
Bibliography:Application Number: CN202110784131