Warning: Full texts from electronic resources are only available from the university network. You are currently outside this network. Please log in to access full texts

NEST‐C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators

Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) de...

Full description

Saved in:

Bibliographic Details
Published in	ETRI journal Vol. 46; no. 5; pp. 851 - 864
Main Authors	Park, Jeman, Yu, Misun, Kwon, Jinse, Park, Junmo, Lee, Jemin, Kwon, Yongin
Format	Journal Article
Language	English
Published	Electronics and Telecommunications Research Institute (ETRI) 01.10.2024 한국전자통신연구원
Subjects	AI accelerator deep learning compiler heterogeneous computing model quantization multi-level ir 전자/정보통신공학
Online Access	Get full text
ISSN	1225-6463 2233-7326
DOI	10.4218/etrij.2024-0139

Cover

More Information
Summary:	Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST‐C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST‐C leverages profiling‐based quantization, dynamic graph partitioning, and multi‐level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST‐C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.
Bibliography:	Funding information This study is supported by a grant from the Institute of Information & Communications Technology Planning & Evaluation (IITP), funded by the Korean government (MSIT) (No. RS‐2023‐00277060, Development of OpenEdge AI SoC hardware and software platform). https://doi.org/10.4218/etrij.2024-0139
ISSN:	1225-6463 2233-7326
DOI:	10.4218/etrij.2024-0139