NEST‐C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators

Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) de...

Full description

Saved in:
Bibliographic Details
Published inETRI journal Vol. 46; no. 5; pp. 851 - 864
Main Authors Park, Jeman, Yu, Misun, Kwon, Jinse, Park, Junmo, Lee, Jemin, Kwon, Yongin
Format Journal Article
LanguageEnglish
Published Electronics and Telecommunications Research Institute (ETRI) 01.10.2024
한국전자통신연구원
Subjects
Online AccessGet full text
ISSN1225-6463
2233-7326
DOI10.4218/etrij.2024-0139

Cover

Abstract Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST‐C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST‐C leverages profiling‐based quantization, dynamic graph partitioning, and multi‐level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST‐C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.
AbstractList Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST‐C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST‐C leverages profiling‐based quantization, dynamic graph partitioning, and multi‐level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST‐C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.
Deep learning (DL) has significantly advanced artificial intelligence (AI); how-ever, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL frame-work that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications. KCI Citation Count: 0
Deep learning (DL) has significantly advanced artificial intelligence (AI); how-ever, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL frame-work that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.
Author Kwon, Jinse
Park, Jeman
Lee, Jemin
Park, Junmo
Kwon, Yongin
Yu, Misun
Author_xml – sequence: 1
  givenname: Jeman
  orcidid: 0009-0002-9524-0738
  surname: Park
  fullname: Park, Jeman
  organization: Electronics and Telecommunications Research Institute
– sequence: 2
  givenname: Misun
  orcidid: 0000-0001-7319-1053
  surname: Yu
  fullname: Yu, Misun
  organization: Electronics and Telecommunications Research Institute
– sequence: 3
  givenname: Jinse
  orcidid: 0000-0003-3091-9926
  surname: Kwon
  fullname: Kwon, Jinse
  organization: Electronics and Telecommunications Research Institute
– sequence: 4
  givenname: Junmo
  orcidid: 0000-0002-8500-8874
  surname: Park
  fullname: Park, Junmo
  organization: Samsung Electronics
– sequence: 5
  givenname: Jemin
  orcidid: 0000-0002-9332-3508
  surname: Lee
  fullname: Lee, Jemin
  email: leejaymin@etri.re.kr
  organization: Electronics and Telecommunications Research Institute
– sequence: 6
  givenname: Yongin
  orcidid: 0000-0003-2973-246X
  surname: Kwon
  fullname: Kwon, Yongin
  email: yongin.kwon@etri.re.kr
  organization: Electronics and Telecommunications Research Institute
BackLink https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003130655$$DAccess content in National Research Foundation of Korea (NRF)
BookMark eNqFkc1qGzEURkVJoU7SdbdaFybR30ij7oxxW0NoIXHXQqO5cmSPR0aaYEw2fYQ-Qp-lj5Inycy4dNvVBXG-w736LtFFFztA6AMlN4LR6hb6FLY3jDBREMr1GzRjjPNCcSYv0IwyVhZSSP4OXea8JYQRUVYz9Pxt-bB--flr8QnPcQNwwC3Y1IVug13cH0ILCftk93CMaYd9TPgRekhxAx3EpzxBT_2I51PuYZ_xMfSP2KY--OCCbXHoemjbMAQc_PltnYPBafuY8jV6622b4f3feYV-fF6uF1-Lu-9fVov5XeF4JVQBJbdUO-qltow4DR6k9rbhqmyqmlDBGmEVd6qRUpfKAkhlmWtKoilYV_Mr9PHs7ZI3OxdMtGGam2h2yczv1ytDieSalmKAV2e4iXZrDinsbTpNiekhpo0Zj3MtGKuFJ1oJLVQtSqEr0Tgmyrp2gg7rjK7bs8ulmHMC_89HiRlLM1NpZizNjKUNCXlOHIefP_0PN8v1PaNMMcVfAdImoZg
Cites_doi 10.1109/CVPR.2016.90
10.1109/ISCA52012.2021.00011
10.1145/3211346.3211348
10.1109/MSP.2012.2211477
10.1109/CGO.2004.1281665
10.1109/5.726791
10.1109/CVPR.2009.5206848
10.1109/MM.2019.2928962
10.1109/CVPR.2015.7298594
10.1109/HCS55958.2022.9895629
10.4218/etrij.2021-0446
10.1016/j.jpdc.2022.12.008
10.1109/TPDS.2020.3030548
10.1016/j.future.2022.02.005
10.1109/CVPR.2017.634
ContentType Journal Article
Copyright 1225‐6463/$ © 2024 ETRI
Copyright_xml – notice: 1225‐6463/$ © 2024 ETRI
DBID AAYXX
CITATION
DOA
ACYCR
DOI 10.4218/etrij.2024-0139
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
Korean Citation Index
DatabaseTitle CrossRef
DatabaseTitleList CrossRef



Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2233-7326
EndPage 864
ExternalDocumentID oai_kci_go_kr_ARTI_10639154
oai_doaj_org_article_a94f0974947b454984dc245bbc413c74
10_4218_etrij_2024_0139
ETR212727
Genre article
GrantInformation_xml – fundername: Institute of Information & Communications Technology Planning & Evaluation (IITP)
  funderid: RS‐2023‐00277060
GroupedDBID -~X
.4S
.DC
.UV
0R~
1OC
29G
2WC
5GY
5VS
9ZL
AAKPC
AAYBS
ACGFS
ACXQS
ACYCR
ADBBV
ADDVE
AENEX
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVUZU
BCNDV
DU5
E3Z
EBS
EDO
EJD
GROUPED_DOAJ
IPNFZ
ITG
ITH
JDI
KQ8
KVFHK
MK~
ML~
O9-
OK1
P5Y
RIG
RNS
TR2
TUS
WIN
XSB
AAYXX
ADMLS
CITATION
OVT
AAMMB
AEFGJ
AGXDD
AIDQK
AIDYY
ID FETCH-LOGICAL-c3847-e53a19c1f69a20c9efe69fad375d8b0142d4a73c7d66957aee67a2cd5091eacb3
IEDL.DBID DOA
ISSN 1225-6463
IngestDate Thu Oct 31 04:11:28 EDT 2024
Wed Aug 27 01:18:42 EDT 2025
Tue Jul 01 02:03:22 EDT 2025
Tue Oct 29 10:47:04 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3847-e53a19c1f69a20c9efe69fad375d8b0142d4a73c7d66957aee67a2cd5091eacb3
Notes Funding information
This study is supported by a grant from the Institute of Information & Communications Technology Planning & Evaluation (IITP), funded by the Korean government (MSIT) (No. RS‐2023‐00277060, Development of OpenEdge AI SoC hardware and software platform).
https://doi.org/10.4218/etrij.2024-0139
ORCID 0000-0003-2973-246X
0000-0003-3091-9926
0000-0002-8500-8874
0009-0002-9524-0738
0000-0002-9332-3508
0000-0001-7319-1053
OpenAccessLink https://doaj.org/article/a94f0974947b454984dc245bbc413c74
PageCount 14
ParticipantIDs nrf_kci_oai_kci_go_kr_ARTI_10639154
doaj_primary_oai_doaj_org_article_a94f0974947b454984dc245bbc413c74
crossref_primary_10_4218_etrij_2024_0139
wiley_primary_10_4218_etrij_2024_0139_ETR212727
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate October 2024
2024-10-00
2024-10-01
2024-10
PublicationDateYYYYMMDD 2024-10-01
PublicationDate_xml – month: 10
  year: 2024
  text: October 2024
PublicationDecade 2020
PublicationTitle ETRI journal
PublicationYear 2024
Publisher Electronics and Telecommunications Research Institute (ETRI)
한국전자통신연구원
Publisher_xml – name: Electronics and Telecommunications Research Institute (ETRI)
– name: 한국전자통신연구원
References 2022; 132
2023; 45
2023
2023; 175
2022
2021
2019; 32
2019; 39
2009
2018
2017
2016
2012; 29
2004
2015
2024
2020; 32
1998; 86
e_1_2_8_28_1
e_1_2_8_29_1
e_1_2_8_24_1
e_1_2_8_25_1
e_1_2_8_26_1
e_1_2_8_27_1
e_1_2_8_3_1
e_1_2_8_2_1
e_1_2_8_5_1
e_1_2_8_4_1
e_1_2_8_7_1
e_1_2_8_6_1
e_1_2_8_9_1
e_1_2_8_8_1
e_1_2_8_20_1
e_1_2_8_21_1
e_1_2_8_22_1
e_1_2_8_23_1
e_1_2_8_17_1
e_1_2_8_18_1
e_1_2_8_19_1
e_1_2_8_13_1
e_1_2_8_14_1
e_1_2_8_15_1
e_1_2_8_16_1
e_1_2_8_10_1
e_1_2_8_31_1
e_1_2_8_11_1
e_1_2_8_12_1
e_1_2_8_30_1
References_xml – start-page: 578
  year: 2018
  end-page: 594
– start-page: 58
  year: 2018
  end-page: 68
– volume: 132
  start-page: 124
  year: 2022
  end-page: 135
  article-title: Quantune: Post‐training quantization of convolutional neural networks using extreme gradient boosting for fast deployment
  publication-title: Future Gener. Comput. Syst.
– start-page: 75
  year: 2004
  end-page: 86
– volume: 86
  start-page: 2278
  issue: 11
  year: 1998
  end-page: 2324
  article-title: Gradient‐based learning applied to document recognition
  publication-title: Proc. IEEE
– start-page: 1492
  year: 2017
  end-page: 1500
– volume: 29
  start-page: 141
  issue: 6
  year: 2012
  end-page: 142
  article-title: The MNIST database of handwritten digit images for machine learning research
  publication-title: IEEE Signal Process. Mag.
– start-page: 265
  year: 2016
  end-page: 283
– volume: 175
  start-page: 66
  year: 2023
  end-page: 79
  article-title: Tensor slicing and optimization for multicore NPUs
  publication-title: J. Parallel Distrib. Comput.
– start-page: 1
  year: 2015
  end-page: 9
– start-page: 770
  year: 2016
  end-page: 778
– year: 2022
– start-page: 15
  year: 2021
  end-page: 28
– year: 2023
– year: 2024
– start-page: 1
  year: 2022
  end-page: 25
– volume: 45
  start-page: 318
  issue: 2
  year: 2023
  end-page: 328
  article-title: PartitionTuner: an operator scheduler for deep‐learning compilers supporting multiple heterogeneous processing units
  publication-title: ETRI J.
– volume: 39
  start-page: 8
  issue: 5
  year: 2019
  end-page: 16
  article-title: A hardware‐software blueprint for flexible deep learning specialization
  publication-title: IEEE Micro
– start-page: 248
  year: 2009
  end-page: 255
– year: 2017
– year: 2016
– volume: 32
  year: 2019
– year: 2018
– volume: 32
  start-page: 708
  issue: 3
  year: 2020
  end-page: 727
  article-title: The deep learning compiler: a comprehensive survey
  publication-title: IEEE Trans. Parallel Distrib. Syst.
– ident: e_1_2_8_28_1
  doi: 10.1109/CVPR.2016.90
– ident: e_1_2_8_31_1
– ident: e_1_2_8_11_1
  doi: 10.1109/ISCA52012.2021.00011
– ident: e_1_2_8_18_1
  doi: 10.1145/3211346.3211348
– ident: e_1_2_8_20_1
– ident: e_1_2_8_29_1
  doi: 10.1109/MSP.2012.2211477
– ident: e_1_2_8_6_1
– ident: e_1_2_8_17_1
  doi: 10.1109/CGO.2004.1281665
– ident: e_1_2_8_3_1
– ident: e_1_2_8_30_1
  doi: 10.1109/5.726791
– ident: e_1_2_8_4_1
– ident: e_1_2_8_7_1
– ident: e_1_2_8_13_1
– ident: e_1_2_8_25_1
  doi: 10.1109/CVPR.2009.5206848
– ident: e_1_2_8_24_1
  doi: 10.1109/MM.2019.2928962
– ident: e_1_2_8_2_1
– ident: e_1_2_8_10_1
– ident: e_1_2_8_16_1
– ident: e_1_2_8_26_1
  doi: 10.1109/CVPR.2015.7298594
– ident: e_1_2_8_14_1
  doi: 10.1109/HCS55958.2022.9895629
– ident: e_1_2_8_19_1
– ident: e_1_2_8_22_1
  doi: 10.4218/etrij.2021-0446
– ident: e_1_2_8_23_1
  doi: 10.1016/j.jpdc.2022.12.008
– ident: e_1_2_8_5_1
  doi: 10.1109/TPDS.2020.3030548
– ident: e_1_2_8_8_1
– ident: e_1_2_8_21_1
  doi: 10.1016/j.future.2022.02.005
– ident: e_1_2_8_27_1
  doi: 10.1109/CVPR.2017.634
– ident: e_1_2_8_15_1
– ident: e_1_2_8_12_1
– ident: e_1_2_8_9_1
SSID ssj0020458
Score 2.3609698
Snippet Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for...
Deep learning (DL) has significantly advanced artificial intelligence (AI); how-ever, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for...
SourceID nrf
doaj
crossref
wiley
SourceType Open Website
Index Database
Publisher
StartPage 851
SubjectTerms AI accelerator
deep learning compiler
heterogeneous computing
model quantization
multi-level ir
전자/정보통신공학
Title NEST‐C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators
URI https://onlinelibrary.wiley.com/doi/abs/10.4218%2Fetrij.2024-0139
https://doaj.org/article/a94f0974947b454984dc245bbc413c74
https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003130655
Volume 46
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
ispartofPNX ETRI Journal, 2024, 46(5), , pp.851-864
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7iSQ_iE-uLgB68hG53s8nGm5aWKtiDVvAWstmJL2hLW8Gf70x2K_XkxdPCkt0dvhky822Gbxi70AYSmukigjQgpNQgHOhEdDyGE2DJkcb5KfdDNXiSd8_588qoL-oJq-WBa-DazsiQYNFrpC4lkplCVj6VeVl63H69jkqgiUmWZKqhWnT8R1QLo1UoqbJa1EdiPmvToKp3JIYpdV_QjPCVfBRl-zHLjGfhd7Eas01_m201ZSK_rs3bYWsw3mWbK-KBe-xr2Hscie4Vv-YVwJQ38x9eOHWJ4ztnPCwbrzhWpvyVGl8mGC-AZD8u-qSWZ15rOc85_ZHlhEetKcHfVsQ6ufMe81M8kp_vs6d-b9QdiGaOgvAZJh8BeeY6xneCMi5NvIEAygRXZTqvihI5UlpJpxHMSimTawegtEt9RbUE7stldsDWx5MxHDIukRCWuYMQkFiWBXLDAFnwJjfKFYVLWuxyiaad1nIZFmkGAW8j8JaAtwR8i90Q2j_LSOc63kDv28b79i_vt9g5-sp--Lf4PF1fJvZjZpEN3OKXFenf46p29OVfJtne6IEU71N99B_GHbONGGax9e-ErS9mn3CKJcyiPIvR-g21UOtQ
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NbhMxEB5BegAOiF-R8mcJDlxW3ex67TW3UKVKoc0BElRxsbzecQiVkmqbnnrhEXgEnoVH4UmY8W6jlAvitNLKP6vxjGc-e_YbgNfaYMo1XZIgDSZSakwc6jQZeFInpJAji_VTjidqPJPvT4qTrX9hWn6IzYEbW0bcr9nA-UCarVySW-JVXDeLb4TwMk6jyM1N2OHYJuvBzvDz7MtsA7v4KpBhF2luoqTKW4IfHmTvryGu-aZI4U8eZ9mE64Fr9DwH9-BuFzKKYbvG9-EGLh_AnS0iwYdwORl9mv7-_mP_rRiKGvFMdNUg5oJzxmnURoSrNCxBcar4ymkwK9IeJOgfG11wArRomZ3PBZ_PClarlmFCLLaoO3_9dN6Tu4o39OePYHYwmu6Pk66sQuJz8kUJFrkbGD8Iyrgs9QYDKhNcneuiLiuCTFktnc69rpUyhXaISrvM1xxa0DZd5Y-ht1wt8QkISfiwKhyGQDizKgkqBsyDN4VRrixd2oc3VwK1Zy17hiXUwbK3UfaWZW9Z9n14xwLfNGPa6_hi1cxtZ0XW0QQpISAjdSUJ2Zay9pksqsqTL_Za9uEVLZc99YvYn5_zlT1tLIGDQ5pZMR0-tdqLy_mvT7Kj6UcmwM_07n_3eAm3xtPjI3t0OPnwFG5HFYspgM-gt24u8DmFMuvqRaerfwAzf-_c
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LbtswECWaGCjSRdBfEDf9EGgW3QiRJYoUu3NTG07aGEFqF0E2BEUNXTeAbSjOqpseoUfoWXqUnqQzlGwk3QRZCRD4EYYznHnk6A1j-0pDTDVdIi80REIoiCyoOOo4VCfAkCMJ9VNOhnIwFsfn2SqbkP6Fqfkh1gduZBlhvyYDX5SejFygV6JFXFbT7wjwEsqiSPUGaxFXHqp6q_t1fDFeoy66CSTUhYobSSHTmt-HBjn4b4hbrikw-KPDmVX-dtwaHE__MdtuIkberZf4CXsAs6fs0Q0ewWfsx7D3ZfT356_D97zLS4AFb4pBTDiljOOoFferLCyOYSr_Rlkwc1QeQOQfGl1T_jOviZ2vOB3PctKqmmCCT28wd_75bZ1DbxUu6K-es3G_NzocRE1Vhcil6IoiyFLb0a7jpbZJ7DR4kNrbMlVZmReImJJSWJU6VUqpM2UBpLKJKymywF26SHfY5mw-g13GBcLDIrPgPcLMIkek6CH1Tmda2jy3cZu9WwnULGryDIOgg2RvguwNyd6Q7NvsAwl83YxYr8OLeTUxjREZixPECIC0UIVAYJuL0iUiKwqHrtgp0WZvcbnMpZuG_vSczM1lZRAbHOHMktjwsdVBWM67Psn0RmfEf5-oF_fu8YY9PP3YN5-Php_22FbQsJAA-JJtLqtreIWBzLJ43ajqP8Yw7wU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=NEST%E2%80%90C%3A+A+deep+learning+compiler+framework+for+heterogeneous+computing+systems+with+artificial+intelligence%C2%A0accelerators&rft.jtitle=ETRI+journal&rft.au=Park%2C+Jeman&rft.au=Yu%2C+Misun&rft.au=Kwon%2C+Jinse&rft.au=Park%2C+Junmo&rft.date=2024-10-01&rft.issn=1225-6463&rft.eissn=2233-7326&rft.volume=46&rft.issue=5&rft.spage=851&rft.epage=864&rft_id=info:doi/10.4218%2Fetrij.2024-0139&rft.externalDBID=n%2Fa&rft.externalDocID=10_4218_etrij_2024_0139
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1225-6463&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1225-6463&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1225-6463&client=summon