Breadth-First Pipeline Parallelism

We introduce Breadth-First Pipeline Parallelism, a novel training schedule which optimizes the combination of pipeline and data parallelism. Breadth-First Pipeline Parallelism lowers training time, cost and memory usage by combining a high GPU utilization with a small batch size per GPU, and by maki...

Full description

Saved in:
Bibliographic Details
Main Author Lamy-Poirier, Joel
Format Journal Article
LanguageEnglish
Published 10.11.2022
Subjects
Online AccessGet full text

Cover

Loading…