NEST‐C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators

Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) de...

Full description

Saved in:

Bibliographic Details
Published in	ETRI journal Vol. 46; no. 5; pp. 851 - 864
Main Authors	Park, Jeman, Yu, Misun, Kwon, Jinse, Park, Junmo, Lee, Jemin, Kwon, Yongin
Format	Journal Article
Language	English
Published	Electronics and Telecommunications Research Institute (ETRI) 01.10.2024 한국전자통신연구원
Subjects	AI accelerator deep learning compiler heterogeneous computing model quantization multi-level ir 전자/정보통신공학
Online Access	Get full text
ISSN	1225-6463 2233-7326
DOI	10.4218/etrij.2024-0139

Cover

Abstract	Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST‐C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST‐C leverages profiling‐based quantization, dynamic graph partitioning, and multi‐level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST‐C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.
AbstractList	Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST‐C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST‐C leverages profiling‐based quantization, dynamic graph partitioning, and multi‐level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST‐C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications. Deep learning (DL) has significantly advanced artificial intelligence (AI); how-ever, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL frame-work that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications. KCI Citation Count: 0 Deep learning (DL) has significantly advanced artificial intelligence (AI); how-ever, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL frame-work that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.
Author	Kwon, Jinse Park, Jeman Lee, Jemin Park, Junmo Kwon, Yongin Yu, Misun
Author_xml	– sequence: 1 givenname: Jeman orcidid: 0009-0002-9524-0738 surname: Park fullname: Park, Jeman organization: Electronics and Telecommunications Research Institute – sequence: 2 givenname: Misun orcidid: 0000-0001-7319-1053 surname: Yu fullname: Yu, Misun organization: Electronics and Telecommunications Research Institute – sequence: 3 givenname: Jinse orcidid: 0000-0003-3091-9926 surname: Kwon fullname: Kwon, Jinse organization: Electronics and Telecommunications Research Institute – sequence: 4 givenname: Junmo orcidid: 0000-0002-8500-8874 surname: Park fullname: Park, Junmo organization: Samsung Electronics – sequence: 5 givenname: Jemin orcidid: 0000-0002-9332-3508 surname: Lee fullname: Lee, Jemin email: leejaymin@etri.re.kr organization: Electronics and Telecommunications Research Institute – sequence: 6 givenname: Yongin orcidid: 0000-0003-2973-246X surname: Kwon fullname: Kwon, Yongin email: yongin.kwon@etri.re.kr organization: Electronics and Telecommunications Research Institute
BackLink	https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003130655$$DAccess content in National Research Foundation of Korea (NRF)
BookMark	eNqFkc1qGzEURkVJoU7SdbdaFybR30ij7oxxW0NoIXHXQqO5cmSPR0aaYEw2fYQ-Qp-lj5Inycy4dNvVBXG-w736LtFFFztA6AMlN4LR6hb6FLY3jDBREMr1GzRjjPNCcSYv0IwyVhZSSP4OXea8JYQRUVYz9Pxt-bB--flr8QnPcQNwwC3Y1IVug13cH0ILCftk93CMaYd9TPgRekhxAx3EpzxBT_2I51PuYZ_xMfSP2KY--OCCbXHoemjbMAQc_PltnYPBafuY8jV6622b4f3feYV-fF6uF1-Lu-9fVov5XeF4JVQBJbdUO-qltow4DR6k9rbhqmyqmlDBGmEVd6qRUpfKAkhlmWtKoilYV_Mr9PHs7ZI3OxdMtGGam2h2yczv1ytDieSalmKAV2e4iXZrDinsbTpNiekhpo0Zj3MtGKuFJ1oJLVQtSqEr0Tgmyrp2gg7rjK7bs8ulmHMC_89HiRlLM1NpZizNjKUNCXlOHIefP_0PN8v1PaNMMcVfAdImoZg
Cites_doi	10.1109/CVPR.2016.90 10.1109/ISCA52012.2021.00011 10.1145/3211346.3211348 10.1109/MSP.2012.2211477 10.1109/CGO.2004.1281665 10.1109/5.726791 10.1109/CVPR.2009.5206848 10.1109/MM.2019.2928962 10.1109/CVPR.2015.7298594 10.1109/HCS55958.2022.9895629 10.4218/etrij.2021-0446 10.1016/j.jpdc.2022.12.008 10.1109/TPDS.2020.3030548 10.1016/j.future.2022.02.005 10.1109/CVPR.2017.634
ContentType	Journal Article
Copyright	1225‐6463/$ © 2024 ETRI
Copyright_xml	– notice: 1225‐6463/$ © 2024 ETRI
DBID	AAYXX CITATION DOA ACYCR
DOI	10.4218/etrij.2024-0139
DatabaseName	CrossRef DOAJ Directory of Open Access Journals Korean Citation Index
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2233-7326
EndPage	864
ExternalDocumentID	oai_kci_go_kr_ARTI_10639154 oai_doaj_org_article_a94f0974947b454984dc245bbc413c74 10_4218_etrij_2024_0139 ETR212727
Genre	article
GrantInformation_xml	– fundername: Institute of Information & Communications Technology Planning & Evaluation (IITP) funderid: RS‐2023‐00277060
GroupedDBID	-~X .4S .DC .UV 0R~ 1OC 29G 2WC 5GY 5VS 9ZL AAKPC AAYBS ACGFS ACXQS ACYCR ADBBV ADDVE AENEX ALMA_UNASSIGNED_HOLDINGS ARCSS AVUZU BCNDV DU5 E3Z EBS EDO EJD GROUPED_DOAJ IPNFZ ITG ITH JDI KQ8 KVFHK MK~ ML~ O9- OK1 P5Y RIG RNS TR2 TUS WIN XSB AAYXX ADMLS CITATION OVT AAMMB AEFGJ AGXDD AIDQK AIDYY
ID	FETCH-LOGICAL-c3847-e53a19c1f69a20c9efe69fad375d8b0142d4a73c7d66957aee67a2cd5091eacb3
IEDL.DBID	DOA
ISSN	1225-6463
IngestDate	Thu Oct 31 04:11:28 EDT 2024 Wed Aug 27 01:18:42 EDT 2025 Tue Jul 01 02:03:22 EDT 2025 Tue Oct 29 10:47:04 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	5
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c3847-e53a19c1f69a20c9efe69fad375d8b0142d4a73c7d66957aee67a2cd5091eacb3
Notes	Funding information This study is supported by a grant from the Institute of Information & Communications Technology Planning & Evaluation (IITP), funded by the Korean government (MSIT) (No. RS‐2023‐00277060, Development of OpenEdge AI SoC hardware and software platform). https://doi.org/10.4218/etrij.2024-0139
ORCID	0000-0003-2973-246X 0000-0003-3091-9926 0000-0002-8500-8874 0009-0002-9524-0738 0000-0002-9332-3508 0000-0001-7319-1053
OpenAccessLink	https://doaj.org/article/a94f0974947b454984dc245bbc413c74
PageCount	14
ParticipantIDs	nrf_kci_oai_kci_go_kr_ARTI_10639154 doaj_primary_oai_doaj_org_article_a94f0974947b454984dc245bbc413c74 crossref_primary_10_4218_etrij_2024_0139 wiley_primary_10_4218_etrij_2024_0139_ETR212727
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	October 2024 2024-10-00 2024-10-01 2024-10
PublicationDateYYYYMMDD	2024-10-01
PublicationDate_xml	– month: 10 year: 2024 text: October 2024
PublicationDecade	2020
PublicationTitle	ETRI journal
PublicationYear	2024
Publisher	Electronics and Telecommunications Research Institute (ETRI) 한국전자통신연구원
Publisher_xml	– name: Electronics and Telecommunications Research Institute (ETRI) – name: 한국전자통신연구원
References	2022; 132 2023; 45 2023 2023; 175 2022 2021 2019; 32 2019; 39 2009 2018 2017 2016 2012; 29 2004 2015 2024 2020; 32 1998; 86 e_1_2_8_28_1 e_1_2_8_29_1 e_1_2_8_24_1 e_1_2_8_25_1 e_1_2_8_26_1 e_1_2_8_27_1 e_1_2_8_3_1 e_1_2_8_2_1 e_1_2_8_5_1 e_1_2_8_4_1 e_1_2_8_7_1 e_1_2_8_6_1 e_1_2_8_9_1 e_1_2_8_8_1 e_1_2_8_20_1 e_1_2_8_21_1 e_1_2_8_22_1 e_1_2_8_23_1 e_1_2_8_17_1 e_1_2_8_18_1 e_1_2_8_19_1 e_1_2_8_13_1 e_1_2_8_14_1 e_1_2_8_15_1 e_1_2_8_16_1 e_1_2_8_10_1 e_1_2_8_31_1 e_1_2_8_11_1 e_1_2_8_12_1 e_1_2_8_30_1
References_xml	– start-page: 578 year: 2018 end-page: 594 – start-page: 58 year: 2018 end-page: 68 – volume: 132 start-page: 124 year: 2022 end-page: 135 article-title: Quantune: Post‐training quantization of convolutional neural networks using extreme gradient boosting for fast deployment publication-title: Future Gener. Comput. Syst. – start-page: 75 year: 2004 end-page: 86 – volume: 86 start-page: 2278 issue: 11 year: 1998 end-page: 2324 article-title: Gradient‐based learning applied to document recognition publication-title: Proc. IEEE – start-page: 1492 year: 2017 end-page: 1500 – volume: 29 start-page: 141 issue: 6 year: 2012 end-page: 142 article-title: The MNIST database of handwritten digit images for machine learning research publication-title: IEEE Signal Process. Mag. – start-page: 265 year: 2016 end-page: 283 – volume: 175 start-page: 66 year: 2023 end-page: 79 article-title: Tensor slicing and optimization for multicore NPUs publication-title: J. Parallel Distrib. Comput. – start-page: 1 year: 2015 end-page: 9 – start-page: 770 year: 2016 end-page: 778 – year: 2022 – start-page: 15 year: 2021 end-page: 28 – year: 2023 – year: 2024 – start-page: 1 year: 2022 end-page: 25 – volume: 45 start-page: 318 issue: 2 year: 2023 end-page: 328 article-title: PartitionTuner: an operator scheduler for deep‐learning compilers supporting multiple heterogeneous processing units publication-title: ETRI J. – volume: 39 start-page: 8 issue: 5 year: 2019 end-page: 16 article-title: A hardware‐software blueprint for flexible deep learning specialization publication-title: IEEE Micro – start-page: 248 year: 2009 end-page: 255 – year: 2017 – year: 2016 – volume: 32 year: 2019 – year: 2018 – volume: 32 start-page: 708 issue: 3 year: 2020 end-page: 727 article-title: The deep learning compiler: a comprehensive survey publication-title: IEEE Trans. Parallel Distrib. Syst. – ident: e_1_2_8_28_1 doi: 10.1109/CVPR.2016.90 – ident: e_1_2_8_31_1 – ident: e_1_2_8_11_1 doi: 10.1109/ISCA52012.2021.00011 – ident: e_1_2_8_18_1 doi: 10.1145/3211346.3211348 – ident: e_1_2_8_20_1 – ident: e_1_2_8_29_1 doi: 10.1109/MSP.2012.2211477 – ident: e_1_2_8_6_1 – ident: e_1_2_8_17_1 doi: 10.1109/CGO.2004.1281665 – ident: e_1_2_8_3_1 – ident: e_1_2_8_30_1 doi: 10.1109/5.726791 – ident: e_1_2_8_4_1 – ident: e_1_2_8_7_1 – ident: e_1_2_8_13_1 – ident: e_1_2_8_25_1 doi: 10.1109/CVPR.2009.5206848 – ident: e_1_2_8_24_1 doi: 10.1109/MM.2019.2928962 – ident: e_1_2_8_2_1 – ident: e_1_2_8_10_1 – ident: e_1_2_8_16_1 – ident: e_1_2_8_26_1 doi: 10.1109/CVPR.2015.7298594 – ident: e_1_2_8_14_1 doi: 10.1109/HCS55958.2022.9895629 – ident: e_1_2_8_19_1 – ident: e_1_2_8_22_1 doi: 10.4218/etrij.2021-0446 – ident: e_1_2_8_23_1 doi: 10.1016/j.jpdc.2022.12.008 – ident: e_1_2_8_5_1 doi: 10.1109/TPDS.2020.3030548 – ident: e_1_2_8_8_1 – ident: e_1_2_8_21_1 doi: 10.1016/j.future.2022.02.005 – ident: e_1_2_8_27_1 doi: 10.1109/CVPR.2017.634 – ident: e_1_2_8_15_1 – ident: e_1_2_8_12_1 – ident: e_1_2_8_9_1
SSID	ssj0020458
Score	2.3609698
Snippet	Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for... Deep learning (DL) has significantly advanced artificial intelligence (AI); how-ever, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for...
SourceID	nrf doaj crossref wiley
SourceType	Open Website Index Database Publisher
StartPage	851
SubjectTerms	AI accelerator deep learning compiler heterogeneous computing model quantization multi-level ir 전자/정보통신공학
Title	NEST‐C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators
URI	https://onlinelibrary.wiley.com/doi/abs/10.4218%2Fetrij.2024-0139 https://doaj.org/article/a94f0974947b454984dc245bbc413c74 https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003130655
Volume	46
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
ispartofPNX	ETRI Journal, 2024, 46(5), , pp.851-864
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7iSQ_iE-uLgB68hG53s8nGm5aWKtiDVvAWstmJL2hLW8Gf70x2K_XkxdPCkt0dvhky822Gbxi70AYSmukigjQgpNQgHOhEdDyGE2DJkcb5KfdDNXiSd8_588qoL-oJq-WBa-DazsiQYNFrpC4lkplCVj6VeVl63H69jkqgiUmWZKqhWnT8R1QLo1UoqbJa1EdiPmvToKp3JIYpdV_QjPCVfBRl-zHLjGfhd7Eas01_m201ZSK_rs3bYWsw3mWbK-KBe-xr2Hscie4Vv-YVwJQ38x9eOHWJ4ztnPCwbrzhWpvyVGl8mGC-AZD8u-qSWZ15rOc85_ZHlhEetKcHfVsQ6ufMe81M8kp_vs6d-b9QdiGaOgvAZJh8BeeY6xneCMi5NvIEAygRXZTqvihI5UlpJpxHMSimTawegtEt9RbUE7stldsDWx5MxHDIukRCWuYMQkFiWBXLDAFnwJjfKFYVLWuxyiaad1nIZFmkGAW8j8JaAtwR8i90Q2j_LSOc63kDv28b79i_vt9g5-sp--Lf4PF1fJvZjZpEN3OKXFenf46p29OVfJtne6IEU71N99B_GHbONGGax9e-ErS9mn3CKJcyiPIvR-g21UOtQ
linkProvider	Directory of Open Access Journals
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NbhMxEB5BegAOiF-R8mcJDlxW3ex67TW3UKVKoc0BElRxsbzecQiVkmqbnnrhEXgEnoVH4UmY8W6jlAvitNLKP6vxjGc-e_YbgNfaYMo1XZIgDSZSakwc6jQZeFInpJAji_VTjidqPJPvT4qTrX9hWn6IzYEbW0bcr9nA-UCarVySW-JVXDeLb4TwMk6jyM1N2OHYJuvBzvDz7MtsA7v4KpBhF2luoqTKW4IfHmTvryGu-aZI4U8eZ9mE64Fr9DwH9-BuFzKKYbvG9-EGLh_AnS0iwYdwORl9mv7-_mP_rRiKGvFMdNUg5oJzxmnURoSrNCxBcar4ymkwK9IeJOgfG11wArRomZ3PBZ_PClarlmFCLLaoO3_9dN6Tu4o39OePYHYwmu6Pk66sQuJz8kUJFrkbGD8Iyrgs9QYDKhNcneuiLiuCTFktnc69rpUyhXaISrvM1xxa0DZd5Y-ht1wt8QkISfiwKhyGQDizKgkqBsyDN4VRrixd2oc3VwK1Zy17hiXUwbK3UfaWZW9Z9n14xwLfNGPa6_hi1cxtZ0XW0QQpISAjdSUJ2Zay9pksqsqTL_Za9uEVLZc99YvYn5_zlT1tLIGDQ5pZMR0-tdqLy_mvT7Kj6UcmwM_07n_3eAm3xtPjI3t0OPnwFG5HFYspgM-gt24u8DmFMuvqRaerfwAzf-_c
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LbtswECWaGCjSRdBfEDf9EGgW3QiRJYoUu3NTG07aGEFqF0E2BEUNXTeAbSjOqpseoUfoWXqUnqQzlGwk3QRZCRD4EYYznHnk6A1j-0pDTDVdIi80REIoiCyoOOo4VCfAkCMJ9VNOhnIwFsfn2SqbkP6Fqfkh1gduZBlhvyYDX5SejFygV6JFXFbT7wjwEsqiSPUGaxFXHqp6q_t1fDFeoy66CSTUhYobSSHTmt-HBjn4b4hbrikw-KPDmVX-dtwaHE__MdtuIkberZf4CXsAs6fs0Q0ewWfsx7D3ZfT356_D97zLS4AFb4pBTDiljOOoFferLCyOYSr_Rlkwc1QeQOQfGl1T_jOviZ2vOB3PctKqmmCCT28wd_75bZ1DbxUu6K-es3G_NzocRE1Vhcil6IoiyFLb0a7jpbZJ7DR4kNrbMlVZmReImJJSWJU6VUqpM2UBpLKJKymywF26SHfY5mw-g13GBcLDIrPgPcLMIkek6CH1Tmda2jy3cZu9WwnULGryDIOgg2RvguwNyd6Q7NvsAwl83YxYr8OLeTUxjREZixPECIC0UIVAYJuL0iUiKwqHrtgp0WZvcbnMpZuG_vSczM1lZRAbHOHMktjwsdVBWM67Psn0RmfEf5-oF_fu8YY9PP3YN5-Php_22FbQsJAA-JJtLqtreIWBzLJ43ajqP8Yw7wU
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=NEST%E2%80%90C%3A+A+deep+learning+compiler+framework+for+heterogeneous+computing+systems+with+artificial+intelligence%C2%A0accelerators&rft.jtitle=ETRI+journal&rft.au=Park%2C+Jeman&rft.au=Yu%2C+Misun&rft.au=Kwon%2C+Jinse&rft.au=Park%2C+Junmo&rft.date=2024-10-01&rft.issn=1225-6463&rft.eissn=2233-7326&rft.volume=46&rft.issue=5&rft.spage=851&rft.epage=864&rft_id=info:doi/10.4218%2Fetrij.2024-0139&rft.externalDBID=n%2Fa&rft.externalDocID=10_4218_etrij_2024_0139
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1225-6463&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1225-6463&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1225-6463&client=summon