Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI Accelerators

Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered a promising approach to address some challenging problems because of their superior generalization capabilities across domai...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 1 - 10
Main Authors	Emani, Murali, Foreman, Sam, Sastry, Varuni, Xie, Zhen, Raskar, Siddhisanket, Arnold, William, Thakur, Rajeev, Vishwanath, Venkatram, Papka, Michael E., Shanmugavelu, Sanjif, Gandhi, Darshan, Zhao, Hengyu, Ma, Dun, Ranganath, Kiran, Weisner, Rick, Chen, Jiunn-yeu, Yang, Yuting, Vassilieva, Natalia, Zhang, Bin C., Howland, Sylvia, Tsyplikhin, Alexander
Format	Conference Proceeding
Language	English
Published	IEEE 27.05.2024
Subjects	AI accelerators Benchmark testing Computational modeling Large language models Sensitivity Training Transformer cores
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered a promising approach to address some challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications are contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this paper, we systematically study LLMs on multiple AI accelerators and GPU s and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT-2 model, and (iii) an LLM-driven science use case, GenSLM. We present our findings and analyses of the models' performance to better understand the intrinsic capabilities of AI accelerators. Furthermore, our analysis takes into account key factors such as sequence lengths, scaling behavior, and sensitivity to gradient accumulation steps.
DOI:	10.1109/IPDPSW63119.2024.00016