Comparative Study of Large Language Model Architectures on Frontier
Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture, spawning numerous variants. However, these variants have undergone pre-training under diverse cond...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
01.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Large language models (LLMs) have garnered significant attention in both the
AI community and beyond. Among these, the Generative Pre-trained Transformer
(GPT) has emerged as the dominant architecture, spawning numerous variants.
However, these variants have undergone pre-training under diverse conditions,
including variations in input data, data preprocessing, and training
methodologies, resulting in a lack of controlled comparative studies. Here we
meticulously examine two prominent open-sourced GPT architectures, GPT-NeoX and
LLaMA, leveraging the computational power of Frontier, the world's first
Exascale supercomputer. Employing the same materials science text corpus and a
comprehensive end-to-end pipeline, we conduct a comparative analysis of their
training and downstream performance. Our efforts culminate in achieving
state-of-the-art performance on a challenging materials science benchmark.
Furthermore, we investigate the computation and energy efficiency, and propose
a computationally efficient method for architecture design. To our knowledge,
these pre-trained models represent the largest available for materials science.
Our findings provide practical guidance for building LLMs on HPC platforms. |
---|---|
DOI: | 10.48550/arxiv.2402.00691 |