Leftover: Improving Large Language Model Inference Efficiency by Leveraging Idle Resources

Large language models and other deep learning models exist in many application areas that have large demands on computing resources but do not have strict real-time response requirements. While the recent algorithmic innovations have primarily focused on optimizing inference latency for large langua...

Full description

Saved in:
Bibliographic Details
Published in2023 International Conference on High Performance Big Data and Intelligent Systems (HDIS) pp. 60 - 65
Main Authors Duan, Xu, Ye, Kejiang
Format Conference Proceeding
LanguageEnglish
Published IEEE 06.12.2023
Subjects
Online AccessGet full text
DOI10.1109/HDIS60872.2023.10499636

Cover

Loading…
More Information
Summary:Large language models and other deep learning models exist in many application areas that have large demands on computing resources but do not have strict real-time response requirements. While the recent algorithmic innovations have primarily focused on optimizing inference latency for large language models, without considering the throughput of inference tasks. On the other hand, data centers often host many underutilized idle resources or offer cost-effective preemptible instances, which can be used by the inference tasks to improve the inference efficiency. Thus, in this paper, we introduce Leftover, a general-purpose large language model inference system that encompasses model compilation, deployment, and task scheduling infrastructure. Leftover leverages idle or preemptible resources to handle inference tasks that are insensitive to latency but require substantial computational power, leading to significant improvements in cluster computing performance. We evaluate Leftover with real-world workloads and simulated preemptive experiments, achieving up to an 11.28x increase in resource utilization compared to baseline methods and a 1.45x performance improvement compared to basic preemptive inference approaches.
DOI:10.1109/HDIS60872.2023.10499636