Serving Multi-DNN Workloads on FPGAs: A Coordinated Architecture, Scheduling, and Mapping Perspective
Deep Neural Network (DNN) INFerence-as-a-Service (INFaaS) is the dominating workload in current data centers, for which FPGAs become promising hardware platforms because of their high flexibility and energy efficiency. The dynamic and multi-tenancy nature of INFaaS requires careful design in three a...
Saved in:
Published in | IEEE transactions on computers Vol. 72; no. 5; pp. 1314 - 1328 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.05.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Deep Neural Network (DNN) INFerence-as-a-Service (INFaaS) is the dominating workload in current data centers, for which FPGAs become promising hardware platforms because of their high flexibility and energy efficiency. The dynamic and multi-tenancy nature of INFaaS requires careful design in three aspects: multi-tenant architecture, multi-DNN scheduling, and multi-core mapping. These three factors are critical to the system latency and energy efficiency but are also challenging to optimize since they are tightly coupled and correlated. This paper proposes H3M , an automatic Design Space Exploration (DSE) framework to jointly optimize the architecture , scheduling , and mapping for serving INFaaS on cloud FPGAs. H3M explores: (1) the architecture design space with H eterogeneous spatial M ulti-tenant sub-accelerators, (2) layer-wise scheduling for H eterogeneous M ulti-DNN workloads, and (3) single-layer mapping to the H omogeneous M ulti-core architecture. H3M beats state-of-the-art multi-tenant DNN accelerators, Planaria and Herald, by up to 7.5× and 3.6× in Energy-Delay-Product (EDP) reduction on the ASIC platform. On the Xilinx U200 and U280 FPGA platforms, H3M offers 2.1-5.7× and 1.8-9.0× EDP reduction over Herald. |
---|---|
ISSN: | 0018-9340 1557-9956 |
DOI: | 10.1109/TC.2022.3214113 |