TSCompiler: efficient compilation framework for dynamic-shape models

Today’s deep learning models face an increasing demand to handle dynamic shape tensors and computation whose shape information remains unknown at compile time and varies in a nearly infinite range at runtime. This shape dynamism brings tremendous challenges for existing compilation pipelines designe...

Full description

Saved in:
Bibliographic Details
Published inScience China. Information sciences Vol. 67; no. 10; p. 200403
Main Authors Luo, Xiang, Zhang, Chen, Geng, Chenbo, Yi, Yanzhi, Hu, Jiahui, Zhang, Renwei, Zhang, Zhen, Consolaro, Gianpietro, Yang, Fan, Lu, Tun, Gu, Ning, Shang, Li
Format Journal Article
LanguageEnglish
Published Beijing Science China Press 01.10.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Today’s deep learning models face an increasing demand to handle dynamic shape tensors and computation whose shape information remains unknown at compile time and varies in a nearly infinite range at runtime. This shape dynamism brings tremendous challenges for existing compilation pipelines designed for static models which optimize tensor programs relying on exact shape values. This paper presents TSCompiler, an end-to-end compilation framework for dynamic shape models. TSCompiler first proposes a symbolic shape propagation algorithm to recover symbolic shape information at compile time to enable subsequent optimizations. TSCompiler then partitions the shape-annotated computation graph into multiple subgraphs and fine-tunes the backbone operators from the subgraph within a hardware-aligned search space to find a collection of high-performance schedules. TSCompiler can propagate the explored backbone schedule to other fusion groups within the same subgraph to generate a set of parameterized tensor programs for fused cases based on dependence analysis. At runtime, TSCompiler utilizes an occupancy-targeted cost model to select from pre-compiled tensor programs for varied tensor shapes. Extensive evaluations show that TSCompiler can achieve state-of-the-art speedups for dynamic shape models. For example, we can improve kernel efficiency by up to 3.97× on NVIDIA RTX3090, and 10.30 × on NVIDIA A100 and achieve up to five orders of magnitude speedups on end-to-end latency.
ISSN:1674-733X
1869-1919
DOI:10.1007/s11432-024-4071-6