HelmGemm: Managing GPUs and FPGAs for Transprecision GEMM Workloads in Containerized Environments

Major global vendors, including Google, IBM, Facebook, and Amazon, have recently provided containerized system configurations as a competitive alternative to traditional hypervisor-based virtualization thanks to their rapid deployment, efficiency, compatibility, and maintainability. Similar to tradi...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP) Vol. 2160-052X; pp. 71 - 74
Main Authors Diamantopoulos, Dionysios, Hagleitner, Christoph
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Major global vendors, including Google, IBM, Facebook, and Amazon, have recently provided containerized system configurations as a competitive alternative to traditional hypervisor-based virtualization thanks to their rapid deployment, efficiency, compatibility, and maintainability. Similar to traditional cloud environments, energy consumption still constitutes the lion's share of overall infrastructure operating expenses. Most public and private cloud providers have coupled their datacenters with accelerators such as GPUs and FPGAs to improve the energy efficiency of their systems. However, it remains a challenging task to manage such heterogeneous systems and share resources in multi-tenant environments while improving energy efficiency. To address this need, we propose HelmGemm, a system-level component to support energy-efficient computing on CPU-GPU-FPGA heterogeneous architectures for container services. HelmGemm is application-specific to workloads featuring the BLAS3 GEMM routine and allows precision selection across the computational progress, i.e. a technique that recently gave rise to the term "transprecision computing". By evaluating HelmGemm on a POWER9 system with 4×V100 GPUs and 2×9V3 FPGAs, we succeeded in improving the average energy efficiency by up to 2.3× in inter-scale containerized configurations across three representative GEMM-based cloud applications in the field of machine learning, i.e. for speech recognition, language modeling, and deep neural networks.
ISSN:2160-052X
DOI:10.1109/ASAP.2019.00-27