High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic

Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on very large scale integration (VLSI) systems Vol. 27; no. 8; pp. 1874 - 1885
Main Authors Lian, Xiaocong, Liu, Zhenyu, Song, Zhourui, Dai, Jiwu, Zhou, Wei, Ji, Xiangyang
Format Journal Article
LanguageEnglish
Published New York IEEE 01.08.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (BFP) arithmetic is adopted in our accelerator for efficient inference of deep neural networks in this paper. The feature maps and model parameters are represented in 16-bit and 8-bit formats, respectively, in the off-chip memory, which can reduce memory and off-chip bandwidth requirements by 50% and 75% compared to the 32-bit FP counterpart. The proposed 8-bit BFP arithmetic with optimized rounding and shifting-operation-based quantization schemes improves the energy and hardware efficiency by three times. One CNN model can be deployed in our accelerator without retraining at the cost of an accuracy loss of not more than 0.12%. The proposed reconfigurable accelerator with three parallelism dimensions, ping-pong off-chip DDR3 memory access, and an optimized on-chip buffer group is implemented on the Xilinx VC709 evaluation board. Our accelerator achieves a performance of 760.83 GOP/s and 82.88 GOP/s/W under a 200-MHz working frequency, significantly outperforming previous accelerators.
AbstractList Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (BFP) arithmetic is adopted in our accelerator for efficient inference of deep neural networks in this paper. The feature maps and model parameters are represented in 16-bit and 8-bit formats, respectively, in the off-chip memory, which can reduce memory and off-chip bandwidth requirements by 50% and 75% compared to the 32-bit FP counterpart. The proposed 8-bit BFP arithmetic with optimized rounding and shifting-operation-based quantization schemes improves the energy and hardware efficiency by three times. One CNN model can be deployed in our accelerator without retraining at the cost of an accuracy loss of not more than 0.12%. The proposed reconfigurable accelerator with three parallelism dimensions, ping-pong off-chip DDR3 memory access, and an optimized on-chip buffer group is implemented on the Xilinx VC709 evaluation board. Our accelerator achieves a performance of 760.83 GOP/s and 82.88 GOP/s/W under a 200-MHz working frequency, significantly outperforming previous accelerators.
Author Dai, Jiwu
Liu, Zhenyu
Zhou, Wei
Song, Zhourui
Ji, Xiangyang
Lian, Xiaocong
Author_xml – sequence: 1
  givenname: Xiaocong
  orcidid: 0000-0003-3917-5273
  surname: Lian
  fullname: Lian, Xiaocong
  email: lian900625@tsinghua.edu.cn
  organization: Department of Automation, Tsinghua University, Beijing, China
– sequence: 2
  givenname: Zhenyu
  surname: Liu
  fullname: Liu, Zhenyu
  email: liuzhenyu73@mail.tsinghua.edu.cn
  organization: Tsinghua National Laboratory for Information Science and Technology, Research Institute of Information Technology, Tsinghua University, Beijing, China
– sequence: 3
  givenname: Zhourui
  surname: Song
  fullname: Song, Zhourui
  email: zrsong99@163.com
  organization: School of Cyberspace Security, Beijing University of Posts and Telecommunications (BUPT), Beijing, China
– sequence: 4
  givenname: Jiwu
  surname: Dai
  fullname: Dai, Jiwu
  email: mr_dai333@mail.nwpu.edu.cn
  organization: School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China
– sequence: 5
  givenname: Wei
  orcidid: 0000-0001-9715-6957
  surname: Zhou
  fullname: Zhou, Wei
  email: zhouwei@nwpu.edu.cn
  organization: School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China
– sequence: 6
  givenname: Xiangyang
  orcidid: 0000-0002-7333-9975
  surname: Ji
  fullname: Ji, Xiangyang
  email: xyji@tsinghua.edu.cn
  organization: Department of Automation, Tsinghua University, Beijing, China
BookMark eNp9UE1PAjEQbQwmAvoH9LKJ52Knu91uj0DkIyFIItHjppQWissWu-Xgv7cI8eDBucwk8968ea-DWrWrNUL3QHoARDwt32av0x4lIHpUQCpYcYXawBjHIlYrziRPcUGB3KBO0-wIgSwTpI0WE7vZ4oX2xvm9rJVORotxHw9ko9fJcD5P-krpSnsZnE_ebdgmg8qpDzyqnAy23uCFs3VI-j6u9jpYdYuujawafXfpXbQcPS-HEzx7GU-H_RlWVLCADRO00CkUBjikjAMFmZIsPmhgbSTXzBiVyXW-YuuV4caANCJak1JQHr100eP57MG7z6NuQrlzR19HxZLSPMtSXjAeUfSMUt41jdemPHi7l_6rBFKegit_gitPwZWX4CKp-ENSNkSzrg5e2up_6sOZarXWv1oFhzwXPP0Gs6B8pQ
CODEN IEVSE9
CitedBy_id crossref_primary_10_1007_s11265_023_01901_8
crossref_primary_10_1145_3546182
crossref_primary_10_1109_TNNLS_2021_3116302
crossref_primary_10_1016_j_micpro_2022_104549
crossref_primary_10_1109_ACCESS_2021_3049299
crossref_primary_10_3390_jlpea10010001
crossref_primary_10_1109_ACCESS_2023_3235866
crossref_primary_10_1142_S0218126623502183
crossref_primary_10_1145_3519598
crossref_primary_10_3390_app11041519
crossref_primary_10_3390_electronics8121527
crossref_primary_10_3390_s22031230
crossref_primary_10_36074_grail_of_science_14_04_2023_039
crossref_primary_10_1109_TVLSI_2023_3332170
crossref_primary_10_1109_JOE_2019_2950974
crossref_primary_10_3390_mi14030531
crossref_primary_10_1145_3705729
crossref_primary_10_3390_agriculture12111849
crossref_primary_10_1007_s11227_021_03909_y
crossref_primary_10_1007_s11554_021_01140_9
crossref_primary_10_1049_csy2_12013
crossref_primary_10_61186_itrc_15_2_12
crossref_primary_10_1109_TCAD_2024_3435996
crossref_primary_10_3390_electronics13071217
crossref_primary_10_1007_s13369_022_06931_1
crossref_primary_10_3390_asi7010010
crossref_primary_10_3390_electronics10060681
crossref_primary_10_1111_coin_12481
crossref_primary_10_1109_TCSI_2023_3300657
crossref_primary_10_1109_TII_2022_3223222
crossref_primary_10_1177_00405175211048156
crossref_primary_10_1109_ACCESS_2022_3229767
crossref_primary_10_1007_s00530_024_01528_0
crossref_primary_10_1049_iet_cds_2019_0225
crossref_primary_10_1002_cpe_6198
crossref_primary_10_1007_s11227_022_04400_y
crossref_primary_10_1109_TVLSI_2022_3197282
crossref_primary_10_1109_OJCAS_2020_3047225
crossref_primary_10_1109_ACCESS_2023_3258360
crossref_primary_10_3389_fnins_2022_929644
crossref_primary_10_1016_j_sysarc_2022_102567
crossref_primary_10_1109_TCSI_2022_3178474
crossref_primary_10_1109_ACCESS_2020_2976879
crossref_primary_10_1016_j_mejo_2021_105250
crossref_primary_10_1007_s11227_023_05255_7
crossref_primary_10_3390_aerospace7110159
crossref_primary_10_3390_electronics9010081
crossref_primary_10_1016_j_ijepes_2020_106721
crossref_primary_10_1587_elex_16_20190396
crossref_primary_10_1109_TVLSI_2021_3104145
crossref_primary_10_3390_electronics8111321
crossref_primary_10_1007_s11042_023_14537_4
crossref_primary_10_1109_TVLSI_2023_3307607
crossref_primary_10_1109_ACCESS_2020_3000444
crossref_primary_10_1109_JETCAS_2020_3022920
crossref_primary_10_1109_TC_2021_3092195
crossref_primary_10_1109_JIOT_2022_3179016
crossref_primary_10_1109_OJCAS_2021_3083332
crossref_primary_10_1137_23M1581819
crossref_primary_10_1007_s11265_022_01756_5
crossref_primary_10_1109_TVLSI_2022_3211665
crossref_primary_10_1145_3474597
crossref_primary_10_3390_electronics9111823
crossref_primary_10_1109_TVLSI_2021_3076081
crossref_primary_10_1016_j_microrel_2022_114498
crossref_primary_10_1109_TVLSI_2020_3002779
crossref_primary_10_1007_s11277_022_09991_6
crossref_primary_10_1016_j_micpro_2021_104242
Cites_doi 10.1109/TNNLS.2018.2852335
10.1109/ISCAS.2017.8050809
10.1109/GlobalSIP.2017.8309067
10.1145/2847263.2847265
10.1109/MICRO.2016.7783723
10.1109/MICRO.2016.7783720
10.1145/2647868.2654889
10.1109/78.492531
10.1007/s11263-015-0816-y
10.1109/CVPR.2015.7298965
10.1109/CVPR.2015.7298594
10.1145/3061639.3062244
10.1109/TCAD.2017.2705069
10.1109/5.726791
10.1145/3240765.3240801
10.1145/2684746.2689060
10.1109/JSSC.2016.2616357
10.1109/TVLSI.2018.2815603
10.1109/TVLSI.2017.2688340
10.1109/CVPR.2016.90
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/TVLSI.2019.2913958
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1557-9999
EndPage 1885
ExternalDocumentID 10_1109_TVLSI_2019_2913958
8716697
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61827804; 61836012; 61620106005; 61325003
  funderid: 10.13039/501100001809
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TN5
VH1
AAYOK
AAYXX
CITATION
RIG
7SP
8FD
L7M
ID FETCH-LOGICAL-c295t-f5928e318f171357121a304210f1dfa7e5ffc4ad6b5dbf7ff1af9291aa927063
IEDL.DBID RIE
ISSN 1063-8210
IngestDate Sun Jun 29 12:24:17 EDT 2025
Tue Jul 01 02:17:45 EDT 2025
Thu Apr 24 23:03:58 EDT 2025
Wed Aug 27 06:31:22 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 8
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-f5928e318f171357121a304210f1dfa7e5ffc4ad6b5dbf7ff1af9291aa927063
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-9715-6957
0000-0002-7333-9975
0000-0003-3917-5273
PQID 2264437857
PQPubID 85424
PageCount 12
ParticipantIDs proquest_journals_2264437857
crossref_primary_10_1109_TVLSI_2019_2913958
ieee_primary_8716697
crossref_citationtrail_10_1109_TVLSI_2019_2913958
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-08-01
PublicationDateYYYYMMDD 2019-08-01
PublicationDate_xml – month: 08
  year: 2019
  text: 2019-08-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on very large scale integration (VLSI) systems
PublicationTitleAbbrev TVLSI
PublicationYear 2019
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref34
ref12
ref15
ref14
ref30
ref33
ref11
gupta (ref18) 2015
corporation (ref6) 2019
ref32
corporation (ref31) 2019
ref2
ref16
conneau (ref4) 2016
li (ref10) 2016
migacz (ref19) 2017
simonyan (ref5) 2014
(ref26) 2016
ref24
ref23
ref25
ref22
ref28
ref8
ref7
ref9
ref3
song (ref21) 2018
krizhevsky (ref1) 2012
köster (ref20) 2017
han (ref17) 2015
krizhevsky (ref29) 2009
horowitz (ref27) 2014
References_xml – start-page: 1737
  year: 2015
  ident: ref18
  article-title: Deep learning with limited numerical precision
  publication-title: Proc Int Conf Mach Learn (ICML)
– start-page: 816
  year: 2018
  ident: ref21
  article-title: Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design
  publication-title: Proc 30th AAAI Conf Artif Intell
– ident: ref14
  doi: 10.1109/TNNLS.2018.2852335
– ident: ref13
  doi: 10.1109/ISCAS.2017.8050809
– ident: ref23
  doi: 10.1109/GlobalSIP.2017.8309067
– start-page: 1742
  year: 2017
  ident: ref20
  article-title: Flexpoint: An adaptive numerical format for efficient training of deep neural networks
  publication-title: Proc Annu Conf Neural Inf Process Syst (NIPS)
– year: 2014
  ident: ref5
  publication-title: Very Deep Convolutional Networks for Large-scale Image Recognition
– ident: ref8
  doi: 10.1145/2847263.2847265
– year: 2019
  ident: ref6
  publication-title: Nvidia cudnn - gpu accelerated deep learning
– year: 2009
  ident: ref29
  article-title: Learning multiple layers of features from tiny images
– ident: ref15
  doi: 10.1109/MICRO.2016.7783723
– ident: ref11
  doi: 10.1109/MICRO.2016.7783720
– year: 2016
  ident: ref26
  publication-title: Image
– start-page: 1097
  year: 2012
  ident: ref1
  article-title: ImageNet classification with deep convolutional neural networks
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref24
  doi: 10.1145/2647868.2654889
– ident: ref30
  doi: 10.1109/78.492531
– start-page: 1
  year: 2016
  ident: ref10
  article-title: A high performance FPGA-based accelerator for large-scale convolutional neural networks
  publication-title: Proc Int Conf Field Program Logic and Appl
– year: 2016
  ident: ref4
  publication-title: Very Deep Convolutional Networks for Text Classification
– ident: ref22
  doi: 10.1007/s11263-015-0816-y
– ident: ref3
  doi: 10.1109/CVPR.2015.7298965
– ident: ref2
  doi: 10.1109/CVPR.2015.7298594
– ident: ref33
  doi: 10.1145/3061639.3062244
– ident: ref9
  doi: 10.1109/TCAD.2017.2705069
– year: 2015
  ident: ref17
  publication-title: Deep compression Compressing deep neural networks with pruning trained quantization and huffman coding
– year: 2017
  ident: ref19
  publication-title: 8-bit inference with tensorrt
– ident: ref28
  doi: 10.1109/5.726791
– ident: ref34
  doi: 10.1145/3240765.3240801
– year: 2019
  ident: ref31
  publication-title: CUDA Toolkit Documentation Floating Point and IEEE 754
– ident: ref7
  doi: 10.1145/2684746.2689060
– ident: ref12
  doi: 10.1109/JSSC.2016.2616357
– ident: ref32
  doi: 10.1109/TVLSI.2018.2815603
– ident: ref16
  doi: 10.1109/TVLSI.2017.2688340
– ident: ref25
  doi: 10.1109/CVPR.2016.90
– start-page: 10
  year: 2014
  ident: ref27
  article-title: 1.1 Computing's energy problem (and what we can do about it)
  publication-title: Proc IEEE Int Solid-State Circuits Conf Dig Tech Papers (ISSCC)
SSID ssj0014490
Score 2.5532255
Snippet Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1874
SubjectTerms Accelerators
Artificial neural networks
Block floating point (BFP)
Computational modeling
Computer memory
Computer vision
convolutional neural network (CNN) accelerator
Embedded systems
Feature maps
Field programmable gate arrays
field-programmable gate array (FPGA)
Floating point arithmetic
Hardware
Mathematical model
Memory management
Neural networks
Quantization (signal)
Retraining
Rounding
Speech processing
three-level parallel
Title High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic
URI https://ieeexplore.ieee.org/document/8716697
https://www.proquest.com/docview/2264437857
Volume 27
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLaAExx4DcRgoB64QbYma5fmuCHKQzBNYjxuVZomYhpsCLoLv5447QoChLhVatJGdlp_dj7bAIc6bGcy0JKoVDHroGifpMJvE8m0kIwbFrpmMNf9zvltcPkQPizAcZULo7V25DPdxEt3lp9N1QxDZS0E9x3BF2HROm5FrlZ1YhAEoqg80GmTyPox8wQZX7SGd1c3F8jiEk2GVTCxvfsXI-S6qvz4FTv7Eq_B9XxlBa1k3JzlaVO9fyva-N-lr8NqCTS9brEzNmBBTzZh5Uv5wRoMkORBBp-pA148OOuSnjVsmXfS73tdpaxVcgfx3v0of_R61vSNSfw0lciWJoPpaJLbV9hbz5gMuQXD-HR4ck7KDgtEMRHmxISCRRgFNRRb9XHKqMT4BvUNzYzkOjRGBTLrpGGWGm4MlcbiKSqlYNxKehuWJtOJ3gFPUCUt0rTgU1hMJqNI-IHiaaSDLLMeYVoHOpd4osrq49gE4ylxXogvEqelBLWUlFqqw1E156WovfHn6BqKvRpZSrwOjblik_LzfEswezho8yjku7_P2oNlfHbB9GvAUv460_sWfeTpgdt2H6UE1E0
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3NbhMxEB6VcgAO5acgQgvsAU7I6dpZx-sDh7QQEppGkQjQm-X12qJqm1R0o6q8Cq_CwzHj3YQKELdK3Fbyn9bzyTNjfzMD8MLLTmkzb5krnEAHxaes0GmHWeG1FSoIGYvBHIy7g4_Z-0N5uAbfV7Ew3vtIPvNt-oxv-eXcLeiqbIeM-65WDYVy319eoIN2_nr4BqX5Uoj-2-negDU1BJgTWlYsSC1yuucLnIrRKS64JQ-ep4GXwSovQ3CZLbuFLIugQuA2oMXArdVCofrGaW_ATTQzpKiDw1ZPFFmm61QH3Q7Lcb5lRE6qd6afRh-GRBvTbUFpN6me_BWtF8u4_HH2R4XWvws_lltR81iO24uqaLtvv2WJ_E_36h5sNIZ00quRfx_W_OwB3LmSXnETJkRiYZNfoRFJf_Kux3ZRcZfJ3nic9JxDrRuJBsnno-pLsouq_Zj1T-aW2OBsMj-aVbgENp1SsOdDmF7HHz2C9dl85h9DormzaEmjca0RDDbPdZo5VeQ-K0v0eIsW8KWAjWuyq1ORjxMTvaxUmwgKQ6AwDSha8Go15qzOLfLP3psk5VXPRsAt2F7iyDTHz7mh6Oiso3Kpnvx91HO4NZgejMxoON7fgtu0Ts1q3Ib16uvCP0VLqyqeRcQnYK4ZNT8Baccw3w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=High-Performance+FPGA-Based+CNN+Accelerator+With+Block-Floating-Point+Arithmetic&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Lian%2C+Xiaocong&rft.au=Liu%2C+Zhenyu&rft.au=Song%2C+Zhourui&rft.au=Dai%2C+Jiwu&rft.date=2019-08-01&rft.pub=IEEE&rft.issn=1063-8210&rft.volume=27&rft.issue=8&rft.spage=1874&rft.epage=1885&rft_id=info:doi/10.1109%2FTVLSI.2019.2913958&rft.externalDocID=8716697
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon