Task-aware Scheduling and Performance Optimization on Yitian710 SoC for GEMM-based Workloads on the Cloud

The YiTian710 SoC is a server processor based on ARM Neoverse N2 architecture and developed by T-HEAD Semiconductor Co., Ltd. to accelerate the compute-intensive tasks in Alicloud, where the ML related workloads play an important role in various applications. The General Matrix Multiplication is the...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) pp. 1 - 5
Main Authors Yu, Guosheng, Lv, Zhihong, Wang, Haijiang, Huang, Zilong, Chen, Jicheng
Format Conference Proceeding
LanguageEnglish
Published IEEE 11.06.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The YiTian710 SoC is a server processor based on ARM Neoverse N2 architecture and developed by T-HEAD Semiconductor Co., Ltd. to accelerate the compute-intensive tasks in Alicloud, where the ML related workloads play an important role in various applications. The General Matrix Multiplication is the fundamental and the most important computing kernel routine extensively utilized in the ML workloads. Generally, the whole GEMM workload is partitioned into a series of blocks and the sub-tasks are professionally assembled to exploit the parallel hardware. However, it is not the case for the cloud workloads which process multi-tasks concurrently and expect guaranteed QoS for commercial consideration. We introduce the task-aware parallel scheduling method to process the ML workloads and balance the response delay and the throughput of the YiTian710 ECS instance. We furtherly design a multi-thread scheduling algorithm with two-level division for the GEMM sub-tasks to achieve high efficiency. The optimized GEMM kernels are developed to attain the optimal performance. We evaluate the performance in YiTian710 based Alicloud ECS for different applications. The results show that our method can achieve remarkable performance improvement for different applications.
ISSN:2834-9857
DOI:10.1109/AICAS57966.2023.10168586