HelmGemm: Managing GPUs and FPGAs for Transprecision GEMM Workloads in Containerized Environments

Major global vendors, including Google, IBM, Facebook, and Amazon, have recently provided containerized system configurations as a competitive alternative to traditional hypervisor-based virtualization thanks to their rapid deployment, efficiency, compatibility, and maintainability. Similar to tradi...

Full description

Saved in:

Bibliographic Details
Published in	2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Vol. 2160-052X; pp. 71 - 74
Main Authors	Diamantopoulos, Dionysios, Hagleitner, Christoph
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2019
Subjects	Cloud computing coherent accelerators Computer architecture container Containers energy saving Field programmable gate arrays FPGA Graphics processing units Hardware Performance evaluation transprecision computing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Major global vendors, including Google, IBM, Facebook, and Amazon, have recently provided containerized system configurations as a competitive alternative to traditional hypervisor-based virtualization thanks to their rapid deployment, efficiency, compatibility, and maintainability. Similar to traditional cloud environments, energy consumption still constitutes the lion's share of overall infrastructure operating expenses. Most public and private cloud providers have coupled their datacenters with accelerators such as GPUs and FPGAs to improve the energy efficiency of their systems. However, it remains a challenging task to manage such heterogeneous systems and share resources in multi-tenant environments while improving energy efficiency. To address this need, we propose HelmGemm, a system-level component to support energy-efficient computing on CPU-GPU-FPGA heterogeneous architectures for container services. HelmGemm is application-specific to workloads featuring the BLAS3 GEMM routine and allows precision selection across the computational progress, i.e. a technique that recently gave rise to the term "transprecision computing". By evaluating HelmGemm on a POWER9 system with 4×V100 GPUs and 2×9V3 FPGAs, we succeeded in improving the average energy efficiency by up to 2.3× in inter-scale containerized configurations across three representative GEMM-based cloud applications in the field of machine learning, i.e. for speech recognition, language modeling, and deep neural networks.
ISSN:	2160-052X
DOI:	10.1109/ASAP.2019.00-27