SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks

Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM,...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computer-aided design of integrated circuits and systems Vol. 43; no. 7; pp. 2163 - 2176
Main Authors	Chen, Yen-Wen, Wang, Rui-Hsuan, Cheng, Yu-Hsiang, Lu, Chih-Cheng, Chang, Meng-Fan, Tang, Kea-Tiong
Format	Journal Article
Language	English
Published	New York IEEE 01.07.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptive filters Algorithms Artificial intelligence Artificial neural networks Co-design Common Information Model (computing) Compression algorithms Computational modeling Computer architecture computing-in-memory (CIM) Convolutional neural networks deep learning Efficiency Hardware Machine learning Memory management quantization Quantization (signal) Random access memory Static random access memory Utilization
Online Access	Get full text

Cover

Loading…

Abstract	Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM, quantization and pruning are indispensable for reducing the calculation complexity and improving the efficiency of hardware calculations. Mixed-precision quantization with flexible bit widths provides a better efficiency-accuracy tradeoff than fixed-precision quantization. However, CIM calculations for mixed-precision models are inefficient because the fixed capacity of CIM macros is redundant for hybrid precision distributions. To address this, we propose a software and hardware co-design static random-access memory (SRAM)-based CIM architecture called SUN, including a CIM-adaptive mixed precision joint pruning quantization algorithm and dynamic hybrid precision CNN accelerator. Three techniques are implemented in this architecture: 1) a mixed precision joint pruning algorithm for reducing the memory access and removing the redundant computing; 2) a CIM-adaptive filter-wise and paired mixed-precision quantization for improving CIM macro utilization; and 3) an SRAM-based CIM CNN accelerator in which the SRAM CIM macro is used as the processing element to support sparse and mixed-precision CNN computation with high CIM macro utilization. This architecture achieves a system area efficiency of 428.2 TOPS/mm 2 and throughput of 792.2 GOPS on the CIFAR-10 dataset.
AbstractList	Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM, quantization and pruning are indispensable for reducing the calculation complexity and improving the efficiency of hardware calculations. Mixed-precision quantization with flexible bit widths provides a better efficiency-accuracy tradeoff than fixed-precision quantization. However, CIM calculations for mixed-precision models are inefficient because the fixed capacity of CIM macros is redundant for hybrid precision distributions. To address this, we propose a software and hardware co-design static random-access memory (SRAM)-based CIM architecture called SUN, including a CIM-adaptive mixed precision joint pruning quantization algorithm and dynamic hybrid precision CNN accelerator. Three techniques are implemented in this architecture: 1) a mixed precision joint pruning algorithm for reducing the memory access and removing the redundant computing; 2) a CIM-adaptive filter-wise and paired mixed-precision quantization for improving CIM macro utilization; and 3) an SRAM-based CIM CNN accelerator in which the SRAM CIM macro is used as the processing element to support sparse and mixed-precision CNN computation with high CIM macro utilization. This architecture achieves a system area efficiency of 428.2 TOPS/mm 2 and throughput of 792.2 GOPS on the CIFAR-10 dataset.
Author	Chang, Meng-Fan Lu, Chih-Cheng Cheng, Yu-Hsiang Chen, Yen-Wen Tang, Kea-Tiong Wang, Rui-Hsuan
Author_xml	– sequence: 1 givenname: Yen-Wen surname: Chen fullname: Chen, Yen-Wen organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan – sequence: 2 givenname: Rui-Hsuan surname: Wang fullname: Wang, Rui-Hsuan organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan – sequence: 3 givenname: Yu-Hsiang surname: Cheng fullname: Cheng, Yu-Hsiang organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan – sequence: 4 givenname: Chih-Cheng orcidid: 0000-0003-2987-5719 surname: Lu fullname: Lu, Chih-Cheng organization: Information and Communication Laboratory, Industrial Technology Research Institute, Chutung, Taiwan – sequence: 5 givenname: Meng-Fan orcidid: 0000-0001-6905-6350 surname: Chang fullname: Chang, Meng-Fan organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan – sequence: 6 givenname: Kea-Tiong orcidid: 0000-0002-9689-1236 surname: Tang fullname: Tang, Kea-Tiong email: kttang@ee.nthu.edu.tw organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
BookMark	eNp9kMtOwzAQRS0EEuXxAUgsLLFO8StuzK6UR5FoqSgVy8hxx2AoSbEdQfkAvpuEskAsWI00mnOv5uygzbIqAaEDSrqUEnV8N-ifdRlhost5mqUZ30AdqngvETSlm6hDWC9LCOmRbbQTwhMhVKRMddDndDY-wWerUr84g4erwrt5MvFgXHBViae3_VFyqgPM8eBqhPvGwAK8jpXH9y4-4qF7eMQjbXyFZ9Et3IeOLTYLrnzA0-hrE2vfwBNfl-1q5N7hd_4Y4lvln8Me2rJ6EWD_Z-6i2cX53WCYXN9cXg3614lhSsQEGFeaCinBzpUgRco0GAvMGhCWGSKtLLQsOJcErBFUFVwXcwYFtyZTWcF30dE6d-mr1xpCzJ-q2pdNZc6JVJKmQrLmiq6vmr9C8GDzpXcv2q9ySvJWd97qzlvd-Y_uhun9YYyL3zai127xL3m4Jh0A_GoSVJCU8S8gNZED
CODEN	ITCSDI
CitedBy_id	crossref_primary_10_1038_s41586_025_08639_2 crossref_primary_10_1109_TVLSI_2024_3502359
Cites_doi	10.1007/978-3-031-20083-0_16 10.1109/DAC18072.2020.9218724 10.1145/3489517.3530660 10.1109/tcad.2021.3078408 10.1109/CVPR.2016.90 10.23919/vlsic.2019.8778028 10.1109/CVPR.2009.5206848 10.1007/978-3-030-58526-6_16 10.1109/isscc42614.2022.9731681 10.1109/ICCV48922.2021.00530 10.1109/ICCV.2019.00038 10.1109/TCAD.2021.3082107 10.1109/ISCAS48785.2022.9938010 10.1109/tcad.2023.3248503 10.1109/ICCV.2017.298 10.1109/jssc.2022.3148273 10.1109/ISSCC42615.2023.10067779 10.1109/ISSCC.2018.8310261 10.1109/jetcas.2023.3242761 10.1038/s41565-020-0655-z 10.1109/tcad.2022.3197495 10.1109/JSSC.2016.2616357 10.1109/isscc42614.2022.9731645 10.1609/aaai.v35i12.17269 10.1109/aicas54282.2022.9870005 10.1109/isscc42613.2021.9365766 10.1007/978-3-030-01237-3_23
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TCAD.2024.3358583
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1937-4151
EndPage	2176
ExternalDocumentID	10_1109_TCAD_2024_3358583 10414052
Genre	orig-research
GrantInformation_xml	– fundername: National Science and Technology Council, Taiwan grantid: MOST 111-2218-E-007-009; MOST 111-2221-E-007-108-MY3 funderid: 10.13039/501100020950
GroupedDBID	--Z -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IBMZZ ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PZZ RIA RIE RNS TN5 VH1 VJK AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c294t-e239a1466efd940b52aecfe2fce4f2c06f6ba6b3360efc419b3abd2eb3fc898b3
IEDL.DBID	RIE
ISSN	0278-0070
IngestDate	Mon Jun 30 10:11:48 EDT 2025 Tue Jul 01 02:13:04 EDT 2025 Thu Apr 24 23:12:03 EDT 2025 Wed Aug 27 02:06:04 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	7
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c294t-e239a1466efd940b52aecfe2fce4f2c06f6ba6b3360efc419b3abd2eb3fc898b3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-6905-6350 0000-0002-9689-1236 0000-0003-2987-5719
PQID	3069615462
PQPubID	85470
PageCount	14
ParticipantIDs	ieee_primary_10414052 proquest_journals_3069615462 crossref_citationtrail_10_1109_TCAD_2024_3358583 crossref_primary_10_1109_TCAD_2024_3358583
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-07-01
PublicationDateYYYYMMDD	2024-07-01
PublicationDate_xml	– month: 07 year: 2024 text: 2024-07-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on computer-aided design of integrated circuits and systems
PublicationTitleAbbrev	TCAD
PublicationYear	2024
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref12 ref14 Wang (ref19) 2018 ref11 ref10 Li (ref40) 2016 ref17 Kundu (ref27) 2021 ref50 ref46 ref48 ref47 ref42 ref41 ref43 Choi (ref16) 2018 Zhu (ref49) Elthakeb (ref38) 2018 ref8 Lou (ref18) 2019 ref7 Zhou (ref15) 2016 ref9 Wen (ref39) 2016 ref4 ref3 ref6 ref5 Simonyan (ref33) Louizos (ref28) ref34 ref36 Bengio (ref45) 2013 ref30 Yao (ref20) 2020 Krizhevsky (ref35) 2009 ref2 Huang (ref24) 2022 Li (ref32) 2016 Uhlich (ref29) 2019 Duncombe (ref1) 1959; ED-11 Shin (ref13) Xiao (ref26) 2022 ref23 Pham (ref37) 2018 ref25 ref22 ref21 Esser (ref44) 2019 Yang (ref31) 2021
References_xml	– year: 2016 ident: ref40 article-title: Pruning filters for efficient ConvNets publication-title: arXiv:1608.08710 – ident: ref23 doi: 10.1007/978-3-031-20083-0_16 – ident: ref50 doi: 10.1109/DAC18072.2020.9218724 – ident: ref4 doi: 10.1145/3489517.3530660 – year: 2019 ident: ref44 article-title: Learned step size quantization publication-title: arXiv:1902.08153 – ident: ref48 doi: 10.1109/tcad.2021.3078408 – volume: ED-11 start-page: 34 issue: 1 year: 1959 ident: ref1 article-title: Infrared navigation—Part I: An assessment of feasibility publication-title: IEEE Trans. Electron Devices – year: 2021 ident: ref31 article-title: BSQ: Exploring bit-level sparsity for mixed-precision neural network quantization publication-title: arXiv:2102.10462 – ident: ref34 doi: 10.1109/CVPR.2016.90 – year: 2019 ident: ref18 article-title: AutoQ: Automated kernel-wise neural network quantization publication-title: arXiv:1902.05690 – ident: ref9 doi: 10.23919/vlsic.2019.8778028 – year: 2016 ident: ref15 article-title: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients publication-title: arXiv:1606.06160 – ident: ref36 doi: 10.1109/CVPR.2009.5206848 – year: 2016 ident: ref39 article-title: Learning structured sparsity in deep neural networks publication-title: arXiv:1608.03665 – start-page: 1 volume-title: Proc. 56th Annu. Design Autom. Conf. ident: ref49 article-title: A configurable multi-precision CNN computing framework based on single bit RRAM – year: 2013 ident: ref45 article-title: Estimating or propagating gradients through stochastic neurons for conditional computation publication-title: arXiv:1308.3432 – ident: ref25 doi: 10.1007/978-3-030-58526-6_16 – start-page: 1 volume-title: Proc. 41st IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD) ident: ref13 article-title: Re2fresh: A framework for mitigating read disturbance in ReRAM-based DNN accelerators – ident: ref10 doi: 10.1109/isscc42614.2022.9731681 – ident: ref22 doi: 10.1109/ICCV48922.2021.00530 – year: 2019 ident: ref29 article-title: Differentiable quantization of deep neural networks publication-title: arXiv:1905.11452 – ident: ref30 doi: 10.1109/ICCV.2019.00038 – year: 2016 ident: ref32 article-title: Ternary weight networks publication-title: arXiv:1605.04711 – year: 2009 ident: ref35 article-title: Learning multiple layers of features from tiny images – ident: ref6 doi: 10.1109/TCAD.2021.3082107 – ident: ref43 doi: 10.1109/ISCAS48785.2022.9938010 – start-page: 1 volume-title: Proc. Int. Conf. Learn. Represent. ident: ref28 article-title: Relaxed quantization for discretized neural networks – ident: ref47 doi: 10.1109/tcad.2023.3248503 – start-page: 1 volume-title: Proc. ICLR ident: ref33 article-title: Very deep convolutional networks for large-scale image recognition – year: 2018 ident: ref38 article-title: ReLeQ: A reinforcement learning approach for deep quantization of neural networks publication-title: arXiv:1811.01704 – year: 2022 ident: ref24 article-title: SDQ: Stochastic differentiable quantization with mixed precision publication-title: arXiv:2206.04459 – ident: ref41 doi: 10.1109/ICCV.2017.298 – ident: ref42 doi: 10.1109/jssc.2022.3148273 – ident: ref14 doi: 10.1109/ISSCC42615.2023.10067779 – year: 2022 ident: ref26 article-title: CSQ: Growing mixed-precision quantization scheme with bi-level continuous sparsification publication-title: arXiv:2212.02770 – year: 2020 ident: ref20 article-title: HAWQV3: Dyadic neural network quantization publication-title: arXiv:2011.10680 – ident: ref3 doi: 10.1109/ISSCC.2018.8310261 – ident: ref7 doi: 10.1109/jetcas.2023.3242761 – ident: ref5 doi: 10.1038/s41565-020-0655-z – year: 2021 ident: ref27 article-title: BMPQ: Bit-gradient sensitivity driven mixed-precision quantization of DNNs from scratch publication-title: arXiv:2112.13843 – ident: ref46 doi: 10.1109/tcad.2022.3197495 – ident: ref2 doi: 10.1109/JSSC.2016.2616357 – year: 2018 ident: ref37 article-title: Efficient neural architecture search via parameter sharing publication-title: arXiv:1802.03268 – year: 2018 ident: ref16 article-title: PACT: Parameterized clipping activation for quantized neural networks publication-title: arXiv:1805.06085 – ident: ref12 doi: 10.1109/isscc42614.2022.9731645 – ident: ref21 doi: 10.1609/aaai.v35i12.17269 – ident: ref11 doi: 10.1109/aicas54282.2022.9870005 – ident: ref8 doi: 10.1109/isscc42613.2021.9365766 – ident: ref17 doi: 10.1007/978-3-030-01237-3_23 – year: 2018 ident: ref19 article-title: HAQ: Hardware-aware automated quantization with mixed precision publication-title: arXiv:1811.08886
SSID	ssj0014529
Score	2.43657
Snippet	Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	2163
SubjectTerms	Adaptive filters Algorithms Artificial intelligence Artificial neural networks Co-design Common Information Model (computing) Compression algorithms Computational modeling Computer architecture computing-in-memory (CIM) Convolutional neural networks deep learning Efficiency Hardware Machine learning Memory management quantization Quantization (signal) Random access memory Static random access memory Utilization
Title	SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks
URI	https://ieeexplore.ieee.org/document/10414052 https://www.proquest.com/docview/3069615462
Volume	43
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTxsxELZaTuUAhYJIecgHTkib7tpea91bCkUBaSNEiMpttR7bIiIKVdhIlB_A72bsdVBaBOptD7Zl74xn5vO8CDlkQgKXXCZGGgQoLquTQhqRoGWAAjODFJhPcC4Hsj8S59f5dUxWD7kw1toQfGa7_jP48s0dzP1TGd5wgXggR4n7EZFbm6z14jLwHsTwoOJLxiIjRxdmlqpvV3gqhIJMdDlH87jgfymh0FXllSgO-uV0nQwWO2vDSm6780Z34fGfoo3_vfXPZC1amrTXssYG-WCnm2R1qf7gF_I0HA2-05O2KT3t__HZW8nFLLbdocPLXpn8QDVn6PFZSXsAqKOCW57-Gjc31MeI0LLGM9JRM57EjE4aohDoMBSmnc9w8sVs7l9faDl-sMvrD9oQ9PstMjr9eXXcT2JjhgSYEk1iGVc1ilhpnVEi1TmrLTjLHFjhGKTSSV1LzblMrQORKc1rbRjidgeFKjTfJivTu6ndIVRlpkaRwEyuQUinVF4AA50r5XSd57JD0gWlKohVy33zjEkV0EuqKk_cyhO3isTtkKOXKb_bkh3vDd7yxFoa2NKpQ_YW_FDFW31fIbxSaAEKyb6-MW2XfPKrt_G8e2QFf7XdR6ul0QeBW58BmvrovQ
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR1Nb9Mw1ELjABz4HKJjgA-ckFISfynerQymDpZooq3YLYqfba1i6lCXShs_YL97z447FRCIWw5-iZ33_L4_CHnLhAKuuMqssmig-KLNSmVFhpoBMswCcmChwLmq1XgmPp_Ik1SsHmthnHMx-cwNw2OM5dtzWAVXGd5wgfaARI57FwW_LPpyrdugQYghRpdKaBqLpJyCmEWu30_xXGgMMjHkHBXkkv8ihuJclT-YcZQwB49Ivd5bn1jyfbjqzBB-_ta28b83_5g8TLomHfXE8YTccYun5MFGB8Jn5Hoyq_fox34sPR1fhfqt7HiZBu_QyddRlX1AQWfp_mFFRwAopWJgnn6bd6c0ZInQqsUz0lk3P0s1nTTmIdBJbE27WiLw8XIV_C-0ml-6zffXfRL6xTaZHXya7o-zNJohA6ZFlznGdYtMVjlvtciNZK0D75gHJzyDXHllWmU4V7nzIApteGssQ8vdQ6lLw5-TrcX5wr0gVBe2RabArDQglNdalsDASK29aaVUA5KvMdVA6lsexmecNdF-yXUTkNsE5DYJuQPy7hbkR9-041-LtwOyNhb2eBqQ3TU9NOleXzRoYGnUAYViO38Be0PujafVUXN0WH95Se6HL_XZvbtkC3-7e4U6TGdeR8q9AeNw7AY
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SUN%3A+Dynamic+Hybrid-Precision+SRAM-Based+CIM+Accelerator+With+High+Macro+Utilization+Using+Structured+Pruning+Mixed-Precision+Networks&rft.jtitle=IEEE+transactions+on+computer-aided+design+of+integrated+circuits+and+systems&rft.au=Chen%2C+Yen-Wen&rft.au=Wang%2C+Rui-Hsuan&rft.au=Cheng%2C+Yu-Hsiang&rft.au=Lu%2C+Chih-Cheng&rft.date=2024-07-01&rft.pub=IEEE&rft.issn=0278-0070&rft.volume=43&rft.issue=7&rft.spage=2163&rft.epage=2176&rft_id=info:doi/10.1109%2FTCAD.2024.3358583&rft.externalDocID=10414052
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0278-0070&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0278-0070&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0278-0070&client=summon