SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks
Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM,...
Saved in:
Published in | IEEE transactions on computer-aided design of integrated circuits and systems Vol. 43; no. 7; pp. 2163 - 2176 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.07.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM, quantization and pruning are indispensable for reducing the calculation complexity and improving the efficiency of hardware calculations. Mixed-precision quantization with flexible bit widths provides a better efficiency-accuracy tradeoff than fixed-precision quantization. However, CIM calculations for mixed-precision models are inefficient because the fixed capacity of CIM macros is redundant for hybrid precision distributions. To address this, we propose a software and hardware co-design static random-access memory (SRAM)-based CIM architecture called SUN, including a CIM-adaptive mixed precision joint pruning quantization algorithm and dynamic hybrid precision CNN accelerator. Three techniques are implemented in this architecture: 1) a mixed precision joint pruning algorithm for reducing the memory access and removing the redundant computing; 2) a CIM-adaptive filter-wise and paired mixed-precision quantization for improving CIM macro utilization; and 3) an SRAM-based CIM CNN accelerator in which the SRAM CIM macro is used as the processing element to support sparse and mixed-precision CNN computation with high CIM macro utilization. This architecture achieves a system area efficiency of 428.2 TOPS/mm 2 and throughput of 792.2 GOPS on the CIFAR-10 dataset. |
---|---|
AbstractList | Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM, quantization and pruning are indispensable for reducing the calculation complexity and improving the efficiency of hardware calculations. Mixed-precision quantization with flexible bit widths provides a better efficiency-accuracy tradeoff than fixed-precision quantization. However, CIM calculations for mixed-precision models are inefficient because the fixed capacity of CIM macros is redundant for hybrid precision distributions. To address this, we propose a software and hardware co-design static random-access memory (SRAM)-based CIM architecture called SUN, including a CIM-adaptive mixed precision joint pruning quantization algorithm and dynamic hybrid precision CNN accelerator. Three techniques are implemented in this architecture: 1) a mixed precision joint pruning algorithm for reducing the memory access and removing the redundant computing; 2) a CIM-adaptive filter-wise and paired mixed-precision quantization for improving CIM macro utilization; and 3) an SRAM-based CIM CNN accelerator in which the SRAM CIM macro is used as the processing element to support sparse and mixed-precision CNN computation with high CIM macro utilization. This architecture achieves a system area efficiency of 428.2 TOPS/mm 2 and throughput of 792.2 GOPS on the CIFAR-10 dataset. |
Author | Chang, Meng-Fan Lu, Chih-Cheng Cheng, Yu-Hsiang Chen, Yen-Wen Tang, Kea-Tiong Wang, Rui-Hsuan |
Author_xml | – sequence: 1 givenname: Yen-Wen surname: Chen fullname: Chen, Yen-Wen organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan – sequence: 2 givenname: Rui-Hsuan surname: Wang fullname: Wang, Rui-Hsuan organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan – sequence: 3 givenname: Yu-Hsiang surname: Cheng fullname: Cheng, Yu-Hsiang organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan – sequence: 4 givenname: Chih-Cheng orcidid: 0000-0003-2987-5719 surname: Lu fullname: Lu, Chih-Cheng organization: Information and Communication Laboratory, Industrial Technology Research Institute, Chutung, Taiwan – sequence: 5 givenname: Meng-Fan orcidid: 0000-0001-6905-6350 surname: Chang fullname: Chang, Meng-Fan organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan – sequence: 6 givenname: Kea-Tiong orcidid: 0000-0002-9689-1236 surname: Tang fullname: Tang, Kea-Tiong email: kttang@ee.nthu.edu.tw organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan |
BookMark | eNp9kMtOwzAQRS0EEuXxAUgsLLFO8StuzK6UR5FoqSgVy8hxx2AoSbEdQfkAvpuEskAsWI00mnOv5uygzbIqAaEDSrqUEnV8N-ifdRlhost5mqUZ30AdqngvETSlm6hDWC9LCOmRbbQTwhMhVKRMddDndDY-wWerUr84g4erwrt5MvFgXHBViae3_VFyqgPM8eBqhPvGwAK8jpXH9y4-4qF7eMQjbXyFZ9Et3IeOLTYLrnzA0-hrE2vfwBNfl-1q5N7hd_4Y4lvln8Me2rJ6EWD_Z-6i2cX53WCYXN9cXg3614lhSsQEGFeaCinBzpUgRco0GAvMGhCWGSKtLLQsOJcErBFUFVwXcwYFtyZTWcF30dE6d-mr1xpCzJ-q2pdNZc6JVJKmQrLmiq6vmr9C8GDzpXcv2q9ySvJWd97qzlvd-Y_uhun9YYyL3zai127xL3m4Jh0A_GoSVJCU8S8gNZED |
CODEN | ITCSDI |
CitedBy_id | crossref_primary_10_1038_s41586_025_08639_2 crossref_primary_10_1109_TVLSI_2024_3502359 |
Cites_doi | 10.1007/978-3-031-20083-0_16 10.1109/DAC18072.2020.9218724 10.1145/3489517.3530660 10.1109/tcad.2021.3078408 10.1109/CVPR.2016.90 10.23919/vlsic.2019.8778028 10.1109/CVPR.2009.5206848 10.1007/978-3-030-58526-6_16 10.1109/isscc42614.2022.9731681 10.1109/ICCV48922.2021.00530 10.1109/ICCV.2019.00038 10.1109/TCAD.2021.3082107 10.1109/ISCAS48785.2022.9938010 10.1109/tcad.2023.3248503 10.1109/ICCV.2017.298 10.1109/jssc.2022.3148273 10.1109/ISSCC42615.2023.10067779 10.1109/ISSCC.2018.8310261 10.1109/jetcas.2023.3242761 10.1038/s41565-020-0655-z 10.1109/tcad.2022.3197495 10.1109/JSSC.2016.2616357 10.1109/isscc42614.2022.9731645 10.1609/aaai.v35i12.17269 10.1109/aicas54282.2022.9870005 10.1109/isscc42613.2021.9365766 10.1007/978-3-030-01237-3_23 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TCAD.2024.3358583 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1937-4151 |
EndPage | 2176 |
ExternalDocumentID | 10_1109_TCAD_2024_3358583 10414052 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Science and Technology Council, Taiwan grantid: MOST 111-2218-E-007-009; MOST 111-2221-E-007-108-MY3 funderid: 10.13039/501100020950 |
GroupedDBID | --Z -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IBMZZ ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PZZ RIA RIE RNS TN5 VH1 VJK AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c294t-e239a1466efd940b52aecfe2fce4f2c06f6ba6b3360efc419b3abd2eb3fc898b3 |
IEDL.DBID | RIE |
ISSN | 0278-0070 |
IngestDate | Mon Jun 30 10:11:48 EDT 2025 Tue Jul 01 02:13:04 EDT 2025 Thu Apr 24 23:12:03 EDT 2025 Wed Aug 27 02:06:04 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 7 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c294t-e239a1466efd940b52aecfe2fce4f2c06f6ba6b3360efc419b3abd2eb3fc898b3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0001-6905-6350 0000-0002-9689-1236 0000-0003-2987-5719 |
PQID | 3069615462 |
PQPubID | 85470 |
PageCount | 14 |
ParticipantIDs | ieee_primary_10414052 proquest_journals_3069615462 crossref_citationtrail_10_1109_TCAD_2024_3358583 crossref_primary_10_1109_TCAD_2024_3358583 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-07-01 |
PublicationDateYYYYMMDD | 2024-07-01 |
PublicationDate_xml | – month: 07 year: 2024 text: 2024-07-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on computer-aided design of integrated circuits and systems |
PublicationTitleAbbrev | TCAD |
PublicationYear | 2024 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref12 ref14 Wang (ref19) 2018 ref11 ref10 Li (ref40) 2016 ref17 Kundu (ref27) 2021 ref50 ref46 ref48 ref47 ref42 ref41 ref43 Choi (ref16) 2018 Zhu (ref49) Elthakeb (ref38) 2018 ref8 Lou (ref18) 2019 ref7 Zhou (ref15) 2016 ref9 Wen (ref39) 2016 ref4 ref3 ref6 ref5 Simonyan (ref33) Louizos (ref28) ref34 ref36 Bengio (ref45) 2013 ref30 Yao (ref20) 2020 Krizhevsky (ref35) 2009 ref2 Huang (ref24) 2022 Li (ref32) 2016 Uhlich (ref29) 2019 Duncombe (ref1) 1959; ED-11 Shin (ref13) Xiao (ref26) 2022 ref23 Pham (ref37) 2018 ref25 ref22 ref21 Esser (ref44) 2019 Yang (ref31) 2021 |
References_xml | – year: 2016 ident: ref40 article-title: Pruning filters for efficient ConvNets publication-title: arXiv:1608.08710 – ident: ref23 doi: 10.1007/978-3-031-20083-0_16 – ident: ref50 doi: 10.1109/DAC18072.2020.9218724 – ident: ref4 doi: 10.1145/3489517.3530660 – year: 2019 ident: ref44 article-title: Learned step size quantization publication-title: arXiv:1902.08153 – ident: ref48 doi: 10.1109/tcad.2021.3078408 – volume: ED-11 start-page: 34 issue: 1 year: 1959 ident: ref1 article-title: Infrared navigation—Part I: An assessment of feasibility publication-title: IEEE Trans. Electron Devices – year: 2021 ident: ref31 article-title: BSQ: Exploring bit-level sparsity for mixed-precision neural network quantization publication-title: arXiv:2102.10462 – ident: ref34 doi: 10.1109/CVPR.2016.90 – year: 2019 ident: ref18 article-title: AutoQ: Automated kernel-wise neural network quantization publication-title: arXiv:1902.05690 – ident: ref9 doi: 10.23919/vlsic.2019.8778028 – year: 2016 ident: ref15 article-title: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients publication-title: arXiv:1606.06160 – ident: ref36 doi: 10.1109/CVPR.2009.5206848 – year: 2016 ident: ref39 article-title: Learning structured sparsity in deep neural networks publication-title: arXiv:1608.03665 – start-page: 1 volume-title: Proc. 56th Annu. Design Autom. Conf. ident: ref49 article-title: A configurable multi-precision CNN computing framework based on single bit RRAM – year: 2013 ident: ref45 article-title: Estimating or propagating gradients through stochastic neurons for conditional computation publication-title: arXiv:1308.3432 – ident: ref25 doi: 10.1007/978-3-030-58526-6_16 – start-page: 1 volume-title: Proc. 41st IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD) ident: ref13 article-title: Re2fresh: A framework for mitigating read disturbance in ReRAM-based DNN accelerators – ident: ref10 doi: 10.1109/isscc42614.2022.9731681 – ident: ref22 doi: 10.1109/ICCV48922.2021.00530 – year: 2019 ident: ref29 article-title: Differentiable quantization of deep neural networks publication-title: arXiv:1905.11452 – ident: ref30 doi: 10.1109/ICCV.2019.00038 – year: 2016 ident: ref32 article-title: Ternary weight networks publication-title: arXiv:1605.04711 – year: 2009 ident: ref35 article-title: Learning multiple layers of features from tiny images – ident: ref6 doi: 10.1109/TCAD.2021.3082107 – ident: ref43 doi: 10.1109/ISCAS48785.2022.9938010 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Represent. ident: ref28 article-title: Relaxed quantization for discretized neural networks – ident: ref47 doi: 10.1109/tcad.2023.3248503 – start-page: 1 volume-title: Proc. ICLR ident: ref33 article-title: Very deep convolutional networks for large-scale image recognition – year: 2018 ident: ref38 article-title: ReLeQ: A reinforcement learning approach for deep quantization of neural networks publication-title: arXiv:1811.01704 – year: 2022 ident: ref24 article-title: SDQ: Stochastic differentiable quantization with mixed precision publication-title: arXiv:2206.04459 – ident: ref41 doi: 10.1109/ICCV.2017.298 – ident: ref42 doi: 10.1109/jssc.2022.3148273 – ident: ref14 doi: 10.1109/ISSCC42615.2023.10067779 – year: 2022 ident: ref26 article-title: CSQ: Growing mixed-precision quantization scheme with bi-level continuous sparsification publication-title: arXiv:2212.02770 – year: 2020 ident: ref20 article-title: HAWQV3: Dyadic neural network quantization publication-title: arXiv:2011.10680 – ident: ref3 doi: 10.1109/ISSCC.2018.8310261 – ident: ref7 doi: 10.1109/jetcas.2023.3242761 – ident: ref5 doi: 10.1038/s41565-020-0655-z – year: 2021 ident: ref27 article-title: BMPQ: Bit-gradient sensitivity driven mixed-precision quantization of DNNs from scratch publication-title: arXiv:2112.13843 – ident: ref46 doi: 10.1109/tcad.2022.3197495 – ident: ref2 doi: 10.1109/JSSC.2016.2616357 – year: 2018 ident: ref37 article-title: Efficient neural architecture search via parameter sharing publication-title: arXiv:1802.03268 – year: 2018 ident: ref16 article-title: PACT: Parameterized clipping activation for quantized neural networks publication-title: arXiv:1805.06085 – ident: ref12 doi: 10.1109/isscc42614.2022.9731645 – ident: ref21 doi: 10.1609/aaai.v35i12.17269 – ident: ref11 doi: 10.1109/aicas54282.2022.9870005 – ident: ref8 doi: 10.1109/isscc42613.2021.9365766 – ident: ref17 doi: 10.1007/978-3-030-01237-3_23 – year: 2018 ident: ref19 article-title: HAQ: Hardware-aware automated quantization with mixed precision publication-title: arXiv:1811.08886 |
SSID | ssj0014529 |
Score | 2.43657 |
Snippet | Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 2163 |
SubjectTerms | Adaptive filters Algorithms Artificial intelligence Artificial neural networks Co-design Common Information Model (computing) Compression algorithms Computational modeling Computer architecture computing-in-memory (CIM) Convolutional neural networks deep learning Efficiency Hardware Machine learning Memory management quantization Quantization (signal) Random access memory Static random access memory Utilization |
Title | SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks |
URI | https://ieeexplore.ieee.org/document/10414052 https://www.proquest.com/docview/3069615462 |
Volume | 43 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTxsxELZaTuUAhYJIecgHTkib7tpea91bCkUBaSNEiMpttR7bIiIKVdhIlB_A72bsdVBaBOptD7Zl74xn5vO8CDlkQgKXXCZGGgQoLquTQhqRoGWAAjODFJhPcC4Hsj8S59f5dUxWD7kw1toQfGa7_jP48s0dzP1TGd5wgXggR4n7EZFbm6z14jLwHsTwoOJLxiIjRxdmlqpvV3gqhIJMdDlH87jgfymh0FXllSgO-uV0nQwWO2vDSm6780Z34fGfoo3_vfXPZC1amrTXssYG-WCnm2R1qf7gF_I0HA2-05O2KT3t__HZW8nFLLbdocPLXpn8QDVn6PFZSXsAqKOCW57-Gjc31MeI0LLGM9JRM57EjE4aohDoMBSmnc9w8sVs7l9faDl-sMvrD9oQ9PstMjr9eXXcT2JjhgSYEk1iGVc1ilhpnVEi1TmrLTjLHFjhGKTSSV1LzblMrQORKc1rbRjidgeFKjTfJivTu6ndIVRlpkaRwEyuQUinVF4AA50r5XSd57JD0gWlKohVy33zjEkV0EuqKk_cyhO3isTtkKOXKb_bkh3vDd7yxFoa2NKpQ_YW_FDFW31fIbxSaAEKyb6-MW2XfPKrt_G8e2QFf7XdR6ul0QeBW58BmvrovQ |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR1Nb9Mw1ELjABz4HKJjgA-ckFISfynerQymDpZooq3YLYqfba1i6lCXShs_YL97z447FRCIWw5-iZ33_L4_CHnLhAKuuMqssmig-KLNSmVFhpoBMswCcmChwLmq1XgmPp_Ik1SsHmthnHMx-cwNw2OM5dtzWAVXGd5wgfaARI57FwW_LPpyrdugQYghRpdKaBqLpJyCmEWu30_xXGgMMjHkHBXkkv8ihuJclT-YcZQwB49Ivd5bn1jyfbjqzBB-_ta28b83_5g8TLomHfXE8YTccYun5MFGB8Jn5Hoyq_fox34sPR1fhfqt7HiZBu_QyddRlX1AQWfp_mFFRwAopWJgnn6bd6c0ZInQqsUz0lk3P0s1nTTmIdBJbE27WiLw8XIV_C-0ml-6zffXfRL6xTaZHXya7o-zNJohA6ZFlznGdYtMVjlvtciNZK0D75gHJzyDXHllWmU4V7nzIApteGssQ8vdQ6lLw5-TrcX5wr0gVBe2RabArDQglNdalsDASK29aaVUA5KvMdVA6lsexmecNdF-yXUTkNsE5DYJuQPy7hbkR9-041-LtwOyNhb2eBqQ3TU9NOleXzRoYGnUAYViO38Be0PujafVUXN0WH95Se6HL_XZvbtkC3-7e4U6TGdeR8q9AeNw7AY |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SUN%3A+Dynamic+Hybrid-Precision+SRAM-Based+CIM+Accelerator+With+High+Macro+Utilization+Using+Structured+Pruning+Mixed-Precision+Networks&rft.jtitle=IEEE+transactions+on+computer-aided+design+of+integrated+circuits+and+systems&rft.au=Chen%2C+Yen-Wen&rft.au=Wang%2C+Rui-Hsuan&rft.au=Cheng%2C+Yu-Hsiang&rft.au=Lu%2C+Chih-Cheng&rft.date=2024-07-01&rft.pub=IEEE&rft.issn=0278-0070&rft.volume=43&rft.issue=7&rft.spage=2163&rft.epage=2176&rft_id=info:doi/10.1109%2FTCAD.2024.3358583&rft.externalDocID=10414052 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0278-0070&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0278-0070&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0278-0070&client=summon |