SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks

Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM,...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computer-aided design of integrated circuits and systems Vol. 43; no. 7; pp. 2163 - 2176
Main Authors Chen, Yen-Wen, Wang, Rui-Hsuan, Cheng, Yu-Hsiang, Lu, Chih-Cheng, Chang, Meng-Fan, Tang, Kea-Tiong
Format Journal Article
LanguageEnglish
Published New York IEEE 01.07.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM, quantization and pruning are indispensable for reducing the calculation complexity and improving the efficiency of hardware calculations. Mixed-precision quantization with flexible bit widths provides a better efficiency-accuracy tradeoff than fixed-precision quantization. However, CIM calculations for mixed-precision models are inefficient because the fixed capacity of CIM macros is redundant for hybrid precision distributions. To address this, we propose a software and hardware co-design static random-access memory (SRAM)-based CIM architecture called SUN, including a CIM-adaptive mixed precision joint pruning quantization algorithm and dynamic hybrid precision CNN accelerator. Three techniques are implemented in this architecture: 1) a mixed precision joint pruning algorithm for reducing the memory access and removing the redundant computing; 2) a CIM-adaptive filter-wise and paired mixed-precision quantization for improving CIM macro utilization; and 3) an SRAM-based CIM CNN accelerator in which the SRAM CIM macro is used as the processing element to support sparse and mixed-precision CNN computation with high CIM macro utilization. This architecture achieves a system area efficiency of 428.2 TOPS/mm 2 and throughput of 792.2 GOPS on the CIFAR-10 dataset.
AbstractList Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM, quantization and pruning are indispensable for reducing the calculation complexity and improving the efficiency of hardware calculations. Mixed-precision quantization with flexible bit widths provides a better efficiency-accuracy tradeoff than fixed-precision quantization. However, CIM calculations for mixed-precision models are inefficient because the fixed capacity of CIM macros is redundant for hybrid precision distributions. To address this, we propose a software and hardware co-design static random-access memory (SRAM)-based CIM architecture called SUN, including a CIM-adaptive mixed precision joint pruning quantization algorithm and dynamic hybrid precision CNN accelerator. Three techniques are implemented in this architecture: 1) a mixed precision joint pruning algorithm for reducing the memory access and removing the redundant computing; 2) a CIM-adaptive filter-wise and paired mixed-precision quantization for improving CIM macro utilization; and 3) an SRAM-based CIM CNN accelerator in which the SRAM CIM macro is used as the processing element to support sparse and mixed-precision CNN computation with high CIM macro utilization. This architecture achieves a system area efficiency of 428.2 TOPS/mm 2 and throughput of 792.2 GOPS on the CIFAR-10 dataset.
Author Chang, Meng-Fan
Lu, Chih-Cheng
Cheng, Yu-Hsiang
Chen, Yen-Wen
Tang, Kea-Tiong
Wang, Rui-Hsuan
Author_xml – sequence: 1
  givenname: Yen-Wen
  surname: Chen
  fullname: Chen, Yen-Wen
  organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
– sequence: 2
  givenname: Rui-Hsuan
  surname: Wang
  fullname: Wang, Rui-Hsuan
  organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
– sequence: 3
  givenname: Yu-Hsiang
  surname: Cheng
  fullname: Cheng, Yu-Hsiang
  organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
– sequence: 4
  givenname: Chih-Cheng
  orcidid: 0000-0003-2987-5719
  surname: Lu
  fullname: Lu, Chih-Cheng
  organization: Information and Communication Laboratory, Industrial Technology Research Institute, Chutung, Taiwan
– sequence: 5
  givenname: Meng-Fan
  orcidid: 0000-0001-6905-6350
  surname: Chang
  fullname: Chang, Meng-Fan
  organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
– sequence: 6
  givenname: Kea-Tiong
  orcidid: 0000-0002-9689-1236
  surname: Tang
  fullname: Tang, Kea-Tiong
  email: kttang@ee.nthu.edu.tw
  organization: Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
BookMark eNp9kMtOwzAQRS0EEuXxAUgsLLFO8StuzK6UR5FoqSgVy8hxx2AoSbEdQfkAvpuEskAsWI00mnOv5uygzbIqAaEDSrqUEnV8N-ifdRlhost5mqUZ30AdqngvETSlm6hDWC9LCOmRbbQTwhMhVKRMddDndDY-wWerUr84g4erwrt5MvFgXHBViae3_VFyqgPM8eBqhPvGwAK8jpXH9y4-4qF7eMQjbXyFZ9Et3IeOLTYLrnzA0-hrE2vfwBNfl-1q5N7hd_4Y4lvln8Me2rJ6EWD_Z-6i2cX53WCYXN9cXg3614lhSsQEGFeaCinBzpUgRco0GAvMGhCWGSKtLLQsOJcErBFUFVwXcwYFtyZTWcF30dE6d-mr1xpCzJ-q2pdNZc6JVJKmQrLmiq6vmr9C8GDzpXcv2q9ySvJWd97qzlvd-Y_uhun9YYyL3zai127xL3m4Jh0A_GoSVJCU8S8gNZED
CODEN ITCSDI
CitedBy_id crossref_primary_10_1038_s41586_025_08639_2
crossref_primary_10_1109_TVLSI_2024_3502359
Cites_doi 10.1007/978-3-031-20083-0_16
10.1109/DAC18072.2020.9218724
10.1145/3489517.3530660
10.1109/tcad.2021.3078408
10.1109/CVPR.2016.90
10.23919/vlsic.2019.8778028
10.1109/CVPR.2009.5206848
10.1007/978-3-030-58526-6_16
10.1109/isscc42614.2022.9731681
10.1109/ICCV48922.2021.00530
10.1109/ICCV.2019.00038
10.1109/TCAD.2021.3082107
10.1109/ISCAS48785.2022.9938010
10.1109/tcad.2023.3248503
10.1109/ICCV.2017.298
10.1109/jssc.2022.3148273
10.1109/ISSCC42615.2023.10067779
10.1109/ISSCC.2018.8310261
10.1109/jetcas.2023.3242761
10.1038/s41565-020-0655-z
10.1109/tcad.2022.3197495
10.1109/JSSC.2016.2616357
10.1109/isscc42614.2022.9731645
10.1609/aaai.v35i12.17269
10.1109/aicas54282.2022.9870005
10.1109/isscc42613.2021.9365766
10.1007/978-3-030-01237-3_23
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TCAD.2024.3358583
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1937-4151
EndPage 2176
ExternalDocumentID 10_1109_TCAD_2024_3358583
10414052
Genre orig-research
GrantInformation_xml – fundername: National Science and Technology Council, Taiwan
  grantid: MOST 111-2218-E-007-009; MOST 111-2221-E-007-108-MY3
  funderid: 10.13039/501100020950
GroupedDBID --Z
-~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
ACIWK
ACNCT
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IBMZZ
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PZZ
RIA
RIE
RNS
TN5
VH1
VJK
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c294t-e239a1466efd940b52aecfe2fce4f2c06f6ba6b3360efc419b3abd2eb3fc898b3
IEDL.DBID RIE
ISSN 0278-0070
IngestDate Mon Jun 30 10:11:48 EDT 2025
Tue Jul 01 02:13:04 EDT 2025
Thu Apr 24 23:12:03 EDT 2025
Wed Aug 27 02:06:04 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c294t-e239a1466efd940b52aecfe2fce4f2c06f6ba6b3360efc419b3abd2eb3fc898b3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-6905-6350
0000-0002-9689-1236
0000-0003-2987-5719
PQID 3069615462
PQPubID 85470
PageCount 14
ParticipantIDs ieee_primary_10414052
proquest_journals_3069615462
crossref_citationtrail_10_1109_TCAD_2024_3358583
crossref_primary_10_1109_TCAD_2024_3358583
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-07-01
PublicationDateYYYYMMDD 2024-07-01
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-07-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on computer-aided design of integrated circuits and systems
PublicationTitleAbbrev TCAD
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref12
ref14
Wang (ref19) 2018
ref11
ref10
Li (ref40) 2016
ref17
Kundu (ref27) 2021
ref50
ref46
ref48
ref47
ref42
ref41
ref43
Choi (ref16) 2018
Zhu (ref49)
Elthakeb (ref38) 2018
ref8
Lou (ref18) 2019
ref7
Zhou (ref15) 2016
ref9
Wen (ref39) 2016
ref4
ref3
ref6
ref5
Simonyan (ref33)
Louizos (ref28)
ref34
ref36
Bengio (ref45) 2013
ref30
Yao (ref20) 2020
Krizhevsky (ref35) 2009
ref2
Huang (ref24) 2022
Li (ref32) 2016
Uhlich (ref29) 2019
Duncombe (ref1) 1959; ED-11
Shin (ref13)
Xiao (ref26) 2022
ref23
Pham (ref37) 2018
ref25
ref22
ref21
Esser (ref44) 2019
Yang (ref31) 2021
References_xml – year: 2016
  ident: ref40
  article-title: Pruning filters for efficient ConvNets
  publication-title: arXiv:1608.08710
– ident: ref23
  doi: 10.1007/978-3-031-20083-0_16
– ident: ref50
  doi: 10.1109/DAC18072.2020.9218724
– ident: ref4
  doi: 10.1145/3489517.3530660
– year: 2019
  ident: ref44
  article-title: Learned step size quantization
  publication-title: arXiv:1902.08153
– ident: ref48
  doi: 10.1109/tcad.2021.3078408
– volume: ED-11
  start-page: 34
  issue: 1
  year: 1959
  ident: ref1
  article-title: Infrared navigation—Part I: An assessment of feasibility
  publication-title: IEEE Trans. Electron Devices
– year: 2021
  ident: ref31
  article-title: BSQ: Exploring bit-level sparsity for mixed-precision neural network quantization
  publication-title: arXiv:2102.10462
– ident: ref34
  doi: 10.1109/CVPR.2016.90
– year: 2019
  ident: ref18
  article-title: AutoQ: Automated kernel-wise neural network quantization
  publication-title: arXiv:1902.05690
– ident: ref9
  doi: 10.23919/vlsic.2019.8778028
– year: 2016
  ident: ref15
  article-title: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients
  publication-title: arXiv:1606.06160
– ident: ref36
  doi: 10.1109/CVPR.2009.5206848
– year: 2016
  ident: ref39
  article-title: Learning structured sparsity in deep neural networks
  publication-title: arXiv:1608.03665
– start-page: 1
  volume-title: Proc. 56th Annu. Design Autom. Conf.
  ident: ref49
  article-title: A configurable multi-precision CNN computing framework based on single bit RRAM
– year: 2013
  ident: ref45
  article-title: Estimating or propagating gradients through stochastic neurons for conditional computation
  publication-title: arXiv:1308.3432
– ident: ref25
  doi: 10.1007/978-3-030-58526-6_16
– start-page: 1
  volume-title: Proc. 41st IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)
  ident: ref13
  article-title: Re2fresh: A framework for mitigating read disturbance in ReRAM-based DNN accelerators
– ident: ref10
  doi: 10.1109/isscc42614.2022.9731681
– ident: ref22
  doi: 10.1109/ICCV48922.2021.00530
– year: 2019
  ident: ref29
  article-title: Differentiable quantization of deep neural networks
  publication-title: arXiv:1905.11452
– ident: ref30
  doi: 10.1109/ICCV.2019.00038
– year: 2016
  ident: ref32
  article-title: Ternary weight networks
  publication-title: arXiv:1605.04711
– year: 2009
  ident: ref35
  article-title: Learning multiple layers of features from tiny images
– ident: ref6
  doi: 10.1109/TCAD.2021.3082107
– ident: ref43
  doi: 10.1109/ISCAS48785.2022.9938010
– start-page: 1
  volume-title: Proc. Int. Conf. Learn. Represent.
  ident: ref28
  article-title: Relaxed quantization for discretized neural networks
– ident: ref47
  doi: 10.1109/tcad.2023.3248503
– start-page: 1
  volume-title: Proc. ICLR
  ident: ref33
  article-title: Very deep convolutional networks for large-scale image recognition
– year: 2018
  ident: ref38
  article-title: ReLeQ: A reinforcement learning approach for deep quantization of neural networks
  publication-title: arXiv:1811.01704
– year: 2022
  ident: ref24
  article-title: SDQ: Stochastic differentiable quantization with mixed precision
  publication-title: arXiv:2206.04459
– ident: ref41
  doi: 10.1109/ICCV.2017.298
– ident: ref42
  doi: 10.1109/jssc.2022.3148273
– ident: ref14
  doi: 10.1109/ISSCC42615.2023.10067779
– year: 2022
  ident: ref26
  article-title: CSQ: Growing mixed-precision quantization scheme with bi-level continuous sparsification
  publication-title: arXiv:2212.02770
– year: 2020
  ident: ref20
  article-title: HAWQV3: Dyadic neural network quantization
  publication-title: arXiv:2011.10680
– ident: ref3
  doi: 10.1109/ISSCC.2018.8310261
– ident: ref7
  doi: 10.1109/jetcas.2023.3242761
– ident: ref5
  doi: 10.1038/s41565-020-0655-z
– year: 2021
  ident: ref27
  article-title: BMPQ: Bit-gradient sensitivity driven mixed-precision quantization of DNNs from scratch
  publication-title: arXiv:2112.13843
– ident: ref46
  doi: 10.1109/tcad.2022.3197495
– ident: ref2
  doi: 10.1109/JSSC.2016.2616357
– year: 2018
  ident: ref37
  article-title: Efficient neural architecture search via parameter sharing
  publication-title: arXiv:1802.03268
– year: 2018
  ident: ref16
  article-title: PACT: Parameterized clipping activation for quantized neural networks
  publication-title: arXiv:1805.06085
– ident: ref12
  doi: 10.1109/isscc42614.2022.9731645
– ident: ref21
  doi: 10.1609/aaai.v35i12.17269
– ident: ref11
  doi: 10.1109/aicas54282.2022.9870005
– ident: ref8
  doi: 10.1109/isscc42613.2021.9365766
– ident: ref17
  doi: 10.1007/978-3-030-01237-3_23
– year: 2018
  ident: ref19
  article-title: HAQ: Hardware-aware automated quantization with mixed precision
  publication-title: arXiv:1811.08886
SSID ssj0014529
Score 2.43657
Snippet Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2163
SubjectTerms Adaptive filters
Algorithms
Artificial intelligence
Artificial neural networks
Co-design
Common Information Model (computing)
Compression algorithms
Computational modeling
Computer architecture
computing-in-memory (CIM)
Convolutional neural networks
deep learning
Efficiency
Hardware
Machine learning
Memory management
quantization
Quantization (signal)
Random access memory
Static random access memory
Utilization
Title SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks
URI https://ieeexplore.ieee.org/document/10414052
https://www.proquest.com/docview/3069615462
Volume 43
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTxsxELZaTuUAhYJIecgHTkib7tpea91bCkUBaSNEiMpttR7bIiIKVdhIlB_A72bsdVBaBOptD7Zl74xn5vO8CDlkQgKXXCZGGgQoLquTQhqRoGWAAjODFJhPcC4Hsj8S59f5dUxWD7kw1toQfGa7_jP48s0dzP1TGd5wgXggR4n7EZFbm6z14jLwHsTwoOJLxiIjRxdmlqpvV3gqhIJMdDlH87jgfymh0FXllSgO-uV0nQwWO2vDSm6780Z34fGfoo3_vfXPZC1amrTXssYG-WCnm2R1qf7gF_I0HA2-05O2KT3t__HZW8nFLLbdocPLXpn8QDVn6PFZSXsAqKOCW57-Gjc31MeI0LLGM9JRM57EjE4aohDoMBSmnc9w8sVs7l9faDl-sMvrD9oQ9PstMjr9eXXcT2JjhgSYEk1iGVc1ilhpnVEi1TmrLTjLHFjhGKTSSV1LzblMrQORKc1rbRjidgeFKjTfJivTu6ndIVRlpkaRwEyuQUinVF4AA50r5XSd57JD0gWlKohVy33zjEkV0EuqKk_cyhO3isTtkKOXKb_bkh3vDd7yxFoa2NKpQ_YW_FDFW31fIbxSaAEKyb6-MW2XfPKrt_G8e2QFf7XdR6ul0QeBW58BmvrovQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR1Nb9Mw1ELjABz4HKJjgA-ckFISfynerQymDpZooq3YLYqfba1i6lCXShs_YL97z447FRCIWw5-iZ33_L4_CHnLhAKuuMqssmig-KLNSmVFhpoBMswCcmChwLmq1XgmPp_Ik1SsHmthnHMx-cwNw2OM5dtzWAVXGd5wgfaARI57FwW_LPpyrdugQYghRpdKaBqLpJyCmEWu30_xXGgMMjHkHBXkkv8ihuJclT-YcZQwB49Ivd5bn1jyfbjqzBB-_ta28b83_5g8TLomHfXE8YTccYun5MFGB8Jn5Hoyq_fox34sPR1fhfqt7HiZBu_QyddRlX1AQWfp_mFFRwAopWJgnn6bd6c0ZInQqsUz0lk3P0s1nTTmIdBJbE27WiLw8XIV_C-0ml-6zffXfRL6xTaZHXya7o-zNJohA6ZFlznGdYtMVjlvtciNZK0D75gHJzyDXHllWmU4V7nzIApteGssQ8vdQ6lLw5-TrcX5wr0gVBe2RabArDQglNdalsDASK29aaVUA5KvMdVA6lsexmecNdF-yXUTkNsE5DYJuQPy7hbkR9-041-LtwOyNhb2eBqQ3TU9NOleXzRoYGnUAYViO38Be0PujafVUXN0WH95Se6HL_XZvbtkC3-7e4U6TGdeR8q9AeNw7AY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SUN%3A+Dynamic+Hybrid-Precision+SRAM-Based+CIM+Accelerator+With+High+Macro+Utilization+Using+Structured+Pruning+Mixed-Precision+Networks&rft.jtitle=IEEE+transactions+on+computer-aided+design+of+integrated+circuits+and+systems&rft.au=Chen%2C+Yen-Wen&rft.au=Wang%2C+Rui-Hsuan&rft.au=Cheng%2C+Yu-Hsiang&rft.au=Lu%2C+Chih-Cheng&rft.date=2024-07-01&rft.pub=IEEE&rft.issn=0278-0070&rft.volume=43&rft.issue=7&rft.spage=2163&rft.epage=2176&rft_id=info:doi/10.1109%2FTCAD.2024.3358583&rft.externalDocID=10414052
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0278-0070&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0278-0070&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0278-0070&client=summon