Visual large model distributed training method and system
The invention discloses a visual large model distributed training method and system, and the method comprises the steps: constructing a distributed training system which comprises a main control server, a plurality of GPU servers, a distributed storage server, and a storage network switch; determini...
Saved in:
Main Authors | , , , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
30.11.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The invention discloses a visual large model distributed training method and system, and the method comprises the steps: constructing a distributed training system which comprises a main control server, a plurality of GPU servers, a distributed storage server, and a storage network switch; determining a data loading mode according to the size of a data set used for visual large model training; according to structural characteristics of the visual large model, evaluating parameter quantities and calculation quantities of different types of network layer groups in the visual large model, decomposing the visual large model in combination with computing power and caching capability of a GPU, and determining a parallel training scheme of the visual large model; performing model training by adopting a hybrid parallel mode, performing model aggregation, global model updating and model distribution based on the master control server, evaluating the visual large model by utilizing a training set and a verification set |
---|---|
Bibliography: | Application Number: CN202110784131 |