INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To su...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Nair, Lakshmi, Bernadskiy, Mikhail, Madhavan, Arulselvan, Chan, Craig, Ayon Basumallik, Bunandar, Darius
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 07.07.2023
Subjects	Floating point arithmetic Large language models Mathematical models Simulation Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To supplement this ongoing effort, we propose INT-FP-QSim: an open-source simulator that enables flexible evaluation of LLMs and vision transformers at various numerical precisions and formats. INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch and AIMET for a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers at 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like Adaptive Block Floating Point, SmoothQuant, GPTQ and RPTQ on the model performances. We hope INT-FP-QSim will enable researchers to flexibly simulate models at various precisions to support further research in quantization of LLMs and vision transformers.
ISSN:	2331-8422