NEST‐C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators
Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) de...
Saved in:
Published in | ETRI journal Vol. 46; no. 5; pp. 851 - 864 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Electronics and Telecommunications Research Institute (ETRI)
01.10.2024
한국전자통신연구원 |
Subjects | |
Online Access | Get full text |
ISSN | 1225-6463 2233-7326 |
DOI | 10.4218/etrij.2024-0139 |
Cover
Abstract | Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST‐C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST‐C leverages profiling‐based quantization, dynamic graph partitioning, and multi‐level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST‐C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications. |
---|---|
AbstractList | Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general‐purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing‐in‐memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST‐C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST‐C leverages profiling‐based quantization, dynamic graph partitioning, and multi‐level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST‐C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications. Deep learning (DL) has significantly advanced artificial intelligence (AI); how-ever, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL frame-work that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications. KCI Citation Count: 0 Deep learning (DL) has significantly advanced artificial intelligence (AI); how-ever, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL frame-work that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications. |
Author | Kwon, Jinse Park, Jeman Lee, Jemin Park, Junmo Kwon, Yongin Yu, Misun |
Author_xml | – sequence: 1 givenname: Jeman orcidid: 0009-0002-9524-0738 surname: Park fullname: Park, Jeman organization: Electronics and Telecommunications Research Institute – sequence: 2 givenname: Misun orcidid: 0000-0001-7319-1053 surname: Yu fullname: Yu, Misun organization: Electronics and Telecommunications Research Institute – sequence: 3 givenname: Jinse orcidid: 0000-0003-3091-9926 surname: Kwon fullname: Kwon, Jinse organization: Electronics and Telecommunications Research Institute – sequence: 4 givenname: Junmo orcidid: 0000-0002-8500-8874 surname: Park fullname: Park, Junmo organization: Samsung Electronics – sequence: 5 givenname: Jemin orcidid: 0000-0002-9332-3508 surname: Lee fullname: Lee, Jemin email: leejaymin@etri.re.kr organization: Electronics and Telecommunications Research Institute – sequence: 6 givenname: Yongin orcidid: 0000-0003-2973-246X surname: Kwon fullname: Kwon, Yongin email: yongin.kwon@etri.re.kr organization: Electronics and Telecommunications Research Institute |
BackLink | https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003130655$$DAccess content in National Research Foundation of Korea (NRF) |
BookMark | eNqFkc1qGzEURkVJoU7SdbdaFybR30ij7oxxW0NoIXHXQqO5cmSPR0aaYEw2fYQ-Qp-lj5Inycy4dNvVBXG-w736LtFFFztA6AMlN4LR6hb6FLY3jDBREMr1GzRjjPNCcSYv0IwyVhZSSP4OXea8JYQRUVYz9Pxt-bB--flr8QnPcQNwwC3Y1IVug13cH0ILCftk93CMaYd9TPgRekhxAx3EpzxBT_2I51PuYZ_xMfSP2KY--OCCbXHoemjbMAQc_PltnYPBafuY8jV6622b4f3feYV-fF6uF1-Lu-9fVov5XeF4JVQBJbdUO-qltow4DR6k9rbhqmyqmlDBGmEVd6qRUpfKAkhlmWtKoilYV_Mr9PHs7ZI3OxdMtGGam2h2yczv1ytDieSalmKAV2e4iXZrDinsbTpNiekhpo0Zj3MtGKuFJ1oJLVQtSqEr0Tgmyrp2gg7rjK7bs8ulmHMC_89HiRlLM1NpZizNjKUNCXlOHIefP_0PN8v1PaNMMcVfAdImoZg |
Cites_doi | 10.1109/CVPR.2016.90 10.1109/ISCA52012.2021.00011 10.1145/3211346.3211348 10.1109/MSP.2012.2211477 10.1109/CGO.2004.1281665 10.1109/5.726791 10.1109/CVPR.2009.5206848 10.1109/MM.2019.2928962 10.1109/CVPR.2015.7298594 10.1109/HCS55958.2022.9895629 10.4218/etrij.2021-0446 10.1016/j.jpdc.2022.12.008 10.1109/TPDS.2020.3030548 10.1016/j.future.2022.02.005 10.1109/CVPR.2017.634 |
ContentType | Journal Article |
Copyright | 1225‐6463/$ © 2024 ETRI |
Copyright_xml | – notice: 1225‐6463/$ © 2024 ETRI |
DBID | AAYXX CITATION DOA ACYCR |
DOI | 10.4218/etrij.2024-0139 |
DatabaseName | CrossRef DOAJ Directory of Open Access Journals Korean Citation Index |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2233-7326 |
EndPage | 864 |
ExternalDocumentID | oai_kci_go_kr_ARTI_10639154 oai_doaj_org_article_a94f0974947b454984dc245bbc413c74 10_4218_etrij_2024_0139 ETR212727 |
Genre | article |
GrantInformation_xml | – fundername: Institute of Information & Communications Technology Planning & Evaluation (IITP) funderid: RS‐2023‐00277060 |
GroupedDBID | -~X .4S .DC .UV 0R~ 1OC 29G 2WC 5GY 5VS 9ZL AAKPC AAYBS ACGFS ACXQS ACYCR ADBBV ADDVE AENEX ALMA_UNASSIGNED_HOLDINGS ARCSS AVUZU BCNDV DU5 E3Z EBS EDO EJD GROUPED_DOAJ IPNFZ ITG ITH JDI KQ8 KVFHK MK~ ML~ O9- OK1 P5Y RIG RNS TR2 TUS WIN XSB AAYXX ADMLS CITATION OVT AAMMB AEFGJ AGXDD AIDQK AIDYY |
ID | FETCH-LOGICAL-c3847-e53a19c1f69a20c9efe69fad375d8b0142d4a73c7d66957aee67a2cd5091eacb3 |
IEDL.DBID | DOA |
ISSN | 1225-6463 |
IngestDate | Thu Oct 31 04:11:28 EDT 2024 Wed Aug 27 01:18:42 EDT 2025 Tue Jul 01 02:03:22 EDT 2025 Tue Oct 29 10:47:04 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 5 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c3847-e53a19c1f69a20c9efe69fad375d8b0142d4a73c7d66957aee67a2cd5091eacb3 |
Notes | Funding information This study is supported by a grant from the Institute of Information & Communications Technology Planning & Evaluation (IITP), funded by the Korean government (MSIT) (No. RS‐2023‐00277060, Development of OpenEdge AI SoC hardware and software platform). https://doi.org/10.4218/etrij.2024-0139 |
ORCID | 0000-0003-2973-246X 0000-0003-3091-9926 0000-0002-8500-8874 0009-0002-9524-0738 0000-0002-9332-3508 0000-0001-7319-1053 |
OpenAccessLink | https://doaj.org/article/a94f0974947b454984dc245bbc413c74 |
PageCount | 14 |
ParticipantIDs | nrf_kci_oai_kci_go_kr_ARTI_10639154 doaj_primary_oai_doaj_org_article_a94f0974947b454984dc245bbc413c74 crossref_primary_10_4218_etrij_2024_0139 wiley_primary_10_4218_etrij_2024_0139_ETR212727 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | October 2024 2024-10-00 2024-10-01 2024-10 |
PublicationDateYYYYMMDD | 2024-10-01 |
PublicationDate_xml | – month: 10 year: 2024 text: October 2024 |
PublicationDecade | 2020 |
PublicationTitle | ETRI journal |
PublicationYear | 2024 |
Publisher | Electronics and Telecommunications Research Institute (ETRI) 한국전자통신연구원 |
Publisher_xml | – name: Electronics and Telecommunications Research Institute (ETRI) – name: 한국전자통신연구원 |
References | 2022; 132 2023; 45 2023 2023; 175 2022 2021 2019; 32 2019; 39 2009 2018 2017 2016 2012; 29 2004 2015 2024 2020; 32 1998; 86 e_1_2_8_28_1 e_1_2_8_29_1 e_1_2_8_24_1 e_1_2_8_25_1 e_1_2_8_26_1 e_1_2_8_27_1 e_1_2_8_3_1 e_1_2_8_2_1 e_1_2_8_5_1 e_1_2_8_4_1 e_1_2_8_7_1 e_1_2_8_6_1 e_1_2_8_9_1 e_1_2_8_8_1 e_1_2_8_20_1 e_1_2_8_21_1 e_1_2_8_22_1 e_1_2_8_23_1 e_1_2_8_17_1 e_1_2_8_18_1 e_1_2_8_19_1 e_1_2_8_13_1 e_1_2_8_14_1 e_1_2_8_15_1 e_1_2_8_16_1 e_1_2_8_10_1 e_1_2_8_31_1 e_1_2_8_11_1 e_1_2_8_12_1 e_1_2_8_30_1 |
References_xml | – start-page: 578 year: 2018 end-page: 594 – start-page: 58 year: 2018 end-page: 68 – volume: 132 start-page: 124 year: 2022 end-page: 135 article-title: Quantune: Post‐training quantization of convolutional neural networks using extreme gradient boosting for fast deployment publication-title: Future Gener. Comput. Syst. – start-page: 75 year: 2004 end-page: 86 – volume: 86 start-page: 2278 issue: 11 year: 1998 end-page: 2324 article-title: Gradient‐based learning applied to document recognition publication-title: Proc. IEEE – start-page: 1492 year: 2017 end-page: 1500 – volume: 29 start-page: 141 issue: 6 year: 2012 end-page: 142 article-title: The MNIST database of handwritten digit images for machine learning research publication-title: IEEE Signal Process. Mag. – start-page: 265 year: 2016 end-page: 283 – volume: 175 start-page: 66 year: 2023 end-page: 79 article-title: Tensor slicing and optimization for multicore NPUs publication-title: J. Parallel Distrib. Comput. – start-page: 1 year: 2015 end-page: 9 – start-page: 770 year: 2016 end-page: 778 – year: 2022 – start-page: 15 year: 2021 end-page: 28 – year: 2023 – year: 2024 – start-page: 1 year: 2022 end-page: 25 – volume: 45 start-page: 318 issue: 2 year: 2023 end-page: 328 article-title: PartitionTuner: an operator scheduler for deep‐learning compilers supporting multiple heterogeneous processing units publication-title: ETRI J. – volume: 39 start-page: 8 issue: 5 year: 2019 end-page: 16 article-title: A hardware‐software blueprint for flexible deep learning specialization publication-title: IEEE Micro – start-page: 248 year: 2009 end-page: 255 – year: 2017 – year: 2016 – volume: 32 year: 2019 – year: 2018 – volume: 32 start-page: 708 issue: 3 year: 2020 end-page: 727 article-title: The deep learning compiler: a comprehensive survey publication-title: IEEE Trans. Parallel Distrib. Syst. – ident: e_1_2_8_28_1 doi: 10.1109/CVPR.2016.90 – ident: e_1_2_8_31_1 – ident: e_1_2_8_11_1 doi: 10.1109/ISCA52012.2021.00011 – ident: e_1_2_8_18_1 doi: 10.1145/3211346.3211348 – ident: e_1_2_8_20_1 – ident: e_1_2_8_29_1 doi: 10.1109/MSP.2012.2211477 – ident: e_1_2_8_6_1 – ident: e_1_2_8_17_1 doi: 10.1109/CGO.2004.1281665 – ident: e_1_2_8_3_1 – ident: e_1_2_8_30_1 doi: 10.1109/5.726791 – ident: e_1_2_8_4_1 – ident: e_1_2_8_7_1 – ident: e_1_2_8_13_1 – ident: e_1_2_8_25_1 doi: 10.1109/CVPR.2009.5206848 – ident: e_1_2_8_24_1 doi: 10.1109/MM.2019.2928962 – ident: e_1_2_8_2_1 – ident: e_1_2_8_10_1 – ident: e_1_2_8_16_1 – ident: e_1_2_8_26_1 doi: 10.1109/CVPR.2015.7298594 – ident: e_1_2_8_14_1 doi: 10.1109/HCS55958.2022.9895629 – ident: e_1_2_8_19_1 – ident: e_1_2_8_22_1 doi: 10.4218/etrij.2021-0446 – ident: e_1_2_8_23_1 doi: 10.1016/j.jpdc.2022.12.008 – ident: e_1_2_8_5_1 doi: 10.1109/TPDS.2020.3030548 – ident: e_1_2_8_8_1 – ident: e_1_2_8_21_1 doi: 10.1016/j.future.2022.02.005 – ident: e_1_2_8_27_1 doi: 10.1109/CVPR.2017.634 – ident: e_1_2_8_15_1 – ident: e_1_2_8_12_1 – ident: e_1_2_8_9_1 |
SSID | ssj0020458 |
Score | 2.3609698 |
Snippet | Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for... Deep learning (DL) has significantly advanced artificial intelligence (AI); how-ever, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for... |
SourceID | nrf doaj crossref wiley |
SourceType | Open Website Index Database Publisher |
StartPage | 851 |
SubjectTerms | AI accelerator deep learning compiler heterogeneous computing model quantization multi-level ir 전자/정보통신공학 |
Title | NEST‐C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators |
URI | https://onlinelibrary.wiley.com/doi/abs/10.4218%2Fetrij.2024-0139 https://doaj.org/article/a94f0974947b454984dc245bbc413c74 https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003130655 |
Volume | 46 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
ispartofPNX | ETRI Journal, 2024, 46(5), , pp.851-864 |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7iSQ_iE-uLgB68hG53s8nGm5aWKtiDVvAWstmJL2hLW8Gf70x2K_XkxdPCkt0dvhky822Gbxi70AYSmukigjQgpNQgHOhEdDyGE2DJkcb5KfdDNXiSd8_588qoL-oJq-WBa-DazsiQYNFrpC4lkplCVj6VeVl63H69jkqgiUmWZKqhWnT8R1QLo1UoqbJa1EdiPmvToKp3JIYpdV_QjPCVfBRl-zHLjGfhd7Eas01_m201ZSK_rs3bYWsw3mWbK-KBe-xr2Hscie4Vv-YVwJQ38x9eOHWJ4ztnPCwbrzhWpvyVGl8mGC-AZD8u-qSWZ15rOc85_ZHlhEetKcHfVsQ6ufMe81M8kp_vs6d-b9QdiGaOgvAZJh8BeeY6xneCMi5NvIEAygRXZTqvihI5UlpJpxHMSimTawegtEt9RbUE7stldsDWx5MxHDIukRCWuYMQkFiWBXLDAFnwJjfKFYVLWuxyiaad1nIZFmkGAW8j8JaAtwR8i90Q2j_LSOc63kDv28b79i_vt9g5-sp--Lf4PF1fJvZjZpEN3OKXFenf46p29OVfJtne6IEU71N99B_GHbONGGax9e-ErS9mn3CKJcyiPIvR-g21UOtQ |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NbhMxEB5BegAOiF-R8mcJDlxW3ex67TW3UKVKoc0BElRxsbzecQiVkmqbnnrhEXgEnoVH4UmY8W6jlAvitNLKP6vxjGc-e_YbgNfaYMo1XZIgDSZSakwc6jQZeFInpJAji_VTjidqPJPvT4qTrX9hWn6IzYEbW0bcr9nA-UCarVySW-JVXDeLb4TwMk6jyM1N2OHYJuvBzvDz7MtsA7v4KpBhF2luoqTKW4IfHmTvryGu-aZI4U8eZ9mE64Fr9DwH9-BuFzKKYbvG9-EGLh_AnS0iwYdwORl9mv7-_mP_rRiKGvFMdNUg5oJzxmnURoSrNCxBcar4ymkwK9IeJOgfG11wArRomZ3PBZ_PClarlmFCLLaoO3_9dN6Tu4o39OePYHYwmu6Pk66sQuJz8kUJFrkbGD8Iyrgs9QYDKhNcneuiLiuCTFktnc69rpUyhXaISrvM1xxa0DZd5Y-ht1wt8QkISfiwKhyGQDizKgkqBsyDN4VRrixd2oc3VwK1Zy17hiXUwbK3UfaWZW9Z9n14xwLfNGPa6_hi1cxtZ0XW0QQpISAjdSUJ2Zay9pksqsqTL_Za9uEVLZc99YvYn5_zlT1tLIGDQ5pZMR0-tdqLy_mvT7Kj6UcmwM_07n_3eAm3xtPjI3t0OPnwFG5HFYspgM-gt24u8DmFMuvqRaerfwAzf-_c |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LbtswECWaGCjSRdBfEDf9EGgW3QiRJYoUu3NTG07aGEFqF0E2BEUNXTeAbSjOqpseoUfoWXqUnqQzlGwk3QRZCRD4EYYznHnk6A1j-0pDTDVdIi80REIoiCyoOOo4VCfAkCMJ9VNOhnIwFsfn2SqbkP6Fqfkh1gduZBlhvyYDX5SejFygV6JFXFbT7wjwEsqiSPUGaxFXHqp6q_t1fDFeoy66CSTUhYobSSHTmt-HBjn4b4hbrikw-KPDmVX-dtwaHE__MdtuIkberZf4CXsAs6fs0Q0ewWfsx7D3ZfT356_D97zLS4AFb4pBTDiljOOoFferLCyOYSr_Rlkwc1QeQOQfGl1T_jOviZ2vOB3PctKqmmCCT28wd_75bZ1DbxUu6K-es3G_NzocRE1Vhcil6IoiyFLb0a7jpbZJ7DR4kNrbMlVZmReImJJSWJU6VUqpM2UBpLKJKymywF26SHfY5mw-g13GBcLDIrPgPcLMIkek6CH1Tmda2jy3cZu9WwnULGryDIOgg2RvguwNyd6Q7NvsAwl83YxYr8OLeTUxjREZixPECIC0UIVAYJuL0iUiKwqHrtgp0WZvcbnMpZuG_vSczM1lZRAbHOHMktjwsdVBWM67Psn0RmfEf5-oF_fu8YY9PP3YN5-Php_22FbQsJAA-JJtLqtreIWBzLJ43ajqP8Yw7wU |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=NEST%E2%80%90C%3A+A+deep+learning+compiler+framework+for+heterogeneous+computing+systems+with+artificial+intelligence%C2%A0accelerators&rft.jtitle=ETRI+journal&rft.au=Park%2C+Jeman&rft.au=Yu%2C+Misun&rft.au=Kwon%2C+Jinse&rft.au=Park%2C+Junmo&rft.date=2024-10-01&rft.issn=1225-6463&rft.eissn=2233-7326&rft.volume=46&rft.issue=5&rft.spage=851&rft.epage=864&rft_id=info:doi/10.4218%2Fetrij.2024-0139&rft.externalDBID=n%2Fa&rft.externalDocID=10_4218_etrij_2024_0139 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1225-6463&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1225-6463&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1225-6463&client=summon |