Implementation and Evaluation of LLM on a CGLA

Currently, generative AI services such as ChatGPT are attracting global attention. At the same time, the shortage of processing resources like GPUs and the increase in power demand have become significant issues, making the balance between processing performance and power efficiency a critical chall...

Full description

Saved in:
Bibliographic Details
Published inInternational Symposium on Computing and Networking (Online) pp. 252 - 258
Main Authors Uetani, Hitoaki, Nakashima, Yasuhiko
Format Conference Proceeding
LanguageEnglish
Published IEEE 26.11.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Currently, generative AI services such as ChatGPT are attracting global attention. At the same time, the shortage of processing resources like GPUs and the increase in power demand have become significant issues, making the balance between processing performance and power efficiency a critical challenge. In this study, we evaluated the performance of Large Language Models (LLMs) using the IMAX3 prototype of our research group's proposed Linear Array Coarse-Grained Reconfigurable Architecture (CGLA). IMAX3 is implemented on a Field Programmable Gate Array (FPGA), and we compared its processing speed and power efficiency with other computing platforms such as CPUs. IMAX aims to provide a versatile and efficient computing platform for services, including AI, with ease of use being another vital aspect. During the evaluation, we made improvements, such as adding a conversion table from 4-bit integers to single-precision floating-point numbers in the IMAX3 floating-point unit. As a result, we successfully ran GGML, a library for running LLMs on CPUs, on the IMAX3. The computation time ratio reached 80%, demonstrating the potential of CGLA as a viable computing platform for LLMs.
ISSN:2379-1896
DOI:10.1109/CANDAR64496.2024.00040