A Comparative Survey: Reusing Small Pre-Trained Models for Efficient Large Model Training

Training large language models is becoming increasingly complex due to the rapid expansion in their size, resulting in significant computational costs. To address this challenge, various model growth methodologies have been proposed to leverage smaller pre-trained models to incrementally build large...

Full description

Saved in:
Bibliographic Details
Published inSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis pp. 56 - 63
Main Authors Pandey, Dhroov, Ghebremichael, Jonah, Qi, Zongqing, Shu, Tong
Format Conference Proceeding
LanguageEnglish
Published IEEE 17.11.2024
Subjects
Online AccessGet full text
DOI10.1109/SCW63240.2024.00015

Cover

More Information
Summary:Training large language models is becoming increasingly complex due to the rapid expansion in their size, resulting in significant computational costs. To address this challenge, various model growth methodologies have been proposed to leverage smaller pre-trained models to incrementally build larger models and reduce computational requirements. These methods typically involve mapping parameters from small models to large ones using either static functions or learned mappings. Although these approaches have demonstrated effectiveness, there is a lack of comprehensive comparative evaluations in the literature. Additionally, combining different methodologies could potentially yield superior performance. This study provides a uniform evaluation of multiple state-of-the-art model growth techniques and their combinations, revealing that efficient combination techniques can reduce the training cost (in TFLOPs) of individual methods by up to 80%.
DOI:10.1109/SCW63240.2024.00015