A Comparative Survey: Reusing Small Pre-Trained Models for Efficient Large Model Training

Training large language models is becoming increasingly complex due to the rapid expansion in their size, resulting in significant computational costs. To address this challenge, various model growth methodologies have been proposed to leverage smaller pre-trained models to incrementally build large...

Full description

Saved in:

Bibliographic Details
Published in	SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis pp. 56 - 63
Main Authors	Pandey, Dhroov, Ghebremichael, Jonah, Qi, Zongqing, Shu, Tong
Format	Conference Proceeding
Language	English
Published	IEEE 17.11.2024
Subjects	comparative survey Computational modeling Conferences Costs efficient training High performance computing Large language models Learning systems model reuse Reviews Surveys Training Transformers
Online Access	Get full text
DOI	10.1109/SCW63240.2024.00015

Cover

More Information
Summary:	Training large language models is becoming increasingly complex due to the rapid expansion in their size, resulting in significant computational costs. To address this challenge, various model growth methodologies have been proposed to leverage smaller pre-trained models to incrementally build larger models and reduce computational requirements. These methods typically involve mapping parameters from small models to large ones using either static functions or learned mappings. Although these approaches have demonstrated effectiveness, there is a lack of comprehensive comparative evaluations in the literature. Additionally, combining different methodologies could potentially yield superior performance. This study provides a uniform evaluation of multiple state-of-the-art model growth techniques and their combinations, revealing that efficient combination techniques can reduce the training cost (in TFLOPs) of individual methods by up to 80%.
DOI:	10.1109/SCW63240.2024.00015