High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic

Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on very large scale integration (VLSI) systems Vol. 27; no. 8; pp. 1874 - 1885
Main Authors	Lian, Xiaocong, Liu, Zhenyu, Song, Zhourui, Dai, Jiwu, Zhou, Wei, Ji, Xiangyang
Format	Journal Article
Language	English
Published	New York IEEE 01.08.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accelerators Artificial neural networks Block floating point (BFP) Computational modeling Computer memory Computer vision convolutional neural network (CNN) accelerator Embedded systems Feature maps Field programmable gate arrays field-programmable gate array (FPGA) Floating point arithmetic Hardware Mathematical model Memory management Neural networks Quantization (signal) Retraining Rounding Speech processing three-level parallel
Online Access	Get full text

Cover

Loading…

Abstract	Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (BFP) arithmetic is adopted in our accelerator for efficient inference of deep neural networks in this paper. The feature maps and model parameters are represented in 16-bit and 8-bit formats, respectively, in the off-chip memory, which can reduce memory and off-chip bandwidth requirements by 50% and 75% compared to the 32-bit FP counterpart. The proposed 8-bit BFP arithmetic with optimized rounding and shifting-operation-based quantization schemes improves the energy and hardware efficiency by three times. One CNN model can be deployed in our accelerator without retraining at the cost of an accuracy loss of not more than 0.12%. The proposed reconfigurable accelerator with three parallelism dimensions, ping-pong off-chip DDR3 memory access, and an optimized on-chip buffer group is implemented on the Xilinx VC709 evaluation board. Our accelerator achieves a performance of 760.83 GOP/s and 82.88 GOP/s/W under a 200-MHz working frequency, significantly outperforming previous accelerators.
AbstractList	Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (BFP) arithmetic is adopted in our accelerator for efficient inference of deep neural networks in this paper. The feature maps and model parameters are represented in 16-bit and 8-bit formats, respectively, in the off-chip memory, which can reduce memory and off-chip bandwidth requirements by 50% and 75% compared to the 32-bit FP counterpart. The proposed 8-bit BFP arithmetic with optimized rounding and shifting-operation-based quantization schemes improves the energy and hardware efficiency by three times. One CNN model can be deployed in our accelerator without retraining at the cost of an accuracy loss of not more than 0.12%. The proposed reconfigurable accelerator with three parallelism dimensions, ping-pong off-chip DDR3 memory access, and an optimized on-chip buffer group is implemented on the Xilinx VC709 evaluation board. Our accelerator achieves a performance of 760.83 GOP/s and 82.88 GOP/s/W under a 200-MHz working frequency, significantly outperforming previous accelerators.
Author	Dai, Jiwu Liu, Zhenyu Zhou, Wei Song, Zhourui Ji, Xiangyang Lian, Xiaocong
Author_xml	– sequence: 1 givenname: Xiaocong orcidid: 0000-0003-3917-5273 surname: Lian fullname: Lian, Xiaocong email: lian900625@tsinghua.edu.cn organization: Department of Automation, Tsinghua University, Beijing, China – sequence: 2 givenname: Zhenyu surname: Liu fullname: Liu, Zhenyu email: liuzhenyu73@mail.tsinghua.edu.cn organization: Tsinghua National Laboratory for Information Science and Technology, Research Institute of Information Technology, Tsinghua University, Beijing, China – sequence: 3 givenname: Zhourui surname: Song fullname: Song, Zhourui email: zrsong99@163.com organization: School of Cyberspace Security, Beijing University of Posts and Telecommunications (BUPT), Beijing, China – sequence: 4 givenname: Jiwu surname: Dai fullname: Dai, Jiwu email: mr_dai333@mail.nwpu.edu.cn organization: School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China – sequence: 5 givenname: Wei orcidid: 0000-0001-9715-6957 surname: Zhou fullname: Zhou, Wei email: zhouwei@nwpu.edu.cn organization: School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China – sequence: 6 givenname: Xiangyang orcidid: 0000-0002-7333-9975 surname: Ji fullname: Ji, Xiangyang email: xyji@tsinghua.edu.cn organization: Department of Automation, Tsinghua University, Beijing, China
BookMark	eNp9UE1PAjEQbQwmAvoH9LKJ52Knu91uj0DkIyFIItHjppQWissWu-Xgv7cI8eDBucwk8968ea-DWrWrNUL3QHoARDwt32av0x4lIHpUQCpYcYXawBjHIlYrziRPcUGB3KBO0-wIgSwTpI0WE7vZ4oX2xvm9rJVORotxHw9ko9fJcD5P-krpSnsZnE_ebdgmg8qpDzyqnAy23uCFs3VI-j6u9jpYdYuujawafXfpXbQcPS-HEzx7GU-H_RlWVLCADRO00CkUBjikjAMFmZIsPmhgbSTXzBiVyXW-YuuV4caANCJak1JQHr100eP57MG7z6NuQrlzR19HxZLSPMtSXjAeUfSMUt41jdemPHi7l_6rBFKegit_gitPwZWX4CKp-ENSNkSzrg5e2up_6sOZarXWv1oFhzwXPP0Gs6B8pQ
CODEN	IEVSE9
CitedBy_id	crossref_primary_10_1007_s11265_023_01901_8 crossref_primary_10_1145_3546182 crossref_primary_10_1109_TNNLS_2021_3116302 crossref_primary_10_1016_j_micpro_2022_104549 crossref_primary_10_1109_ACCESS_2021_3049299 crossref_primary_10_3390_jlpea10010001 crossref_primary_10_1109_ACCESS_2023_3235866 crossref_primary_10_1142_S0218126623502183 crossref_primary_10_1145_3519598 crossref_primary_10_3390_app11041519 crossref_primary_10_3390_electronics8121527 crossref_primary_10_3390_s22031230 crossref_primary_10_36074_grail_of_science_14_04_2023_039 crossref_primary_10_1109_TVLSI_2023_3332170 crossref_primary_10_1109_JOE_2019_2950974 crossref_primary_10_3390_mi14030531 crossref_primary_10_1145_3705729 crossref_primary_10_3390_agriculture12111849 crossref_primary_10_1007_s11227_021_03909_y crossref_primary_10_1007_s11554_021_01140_9 crossref_primary_10_1049_csy2_12013 crossref_primary_10_61186_itrc_15_2_12 crossref_primary_10_1109_TCAD_2024_3435996 crossref_primary_10_3390_electronics13071217 crossref_primary_10_1007_s13369_022_06931_1 crossref_primary_10_3390_asi7010010 crossref_primary_10_3390_electronics10060681 crossref_primary_10_1111_coin_12481 crossref_primary_10_1109_TCSI_2023_3300657 crossref_primary_10_1109_TII_2022_3223222 crossref_primary_10_1177_00405175211048156 crossref_primary_10_1109_ACCESS_2022_3229767 crossref_primary_10_1007_s00530_024_01528_0 crossref_primary_10_1049_iet_cds_2019_0225 crossref_primary_10_1002_cpe_6198 crossref_primary_10_1007_s11227_022_04400_y crossref_primary_10_1109_TVLSI_2022_3197282 crossref_primary_10_1109_OJCAS_2020_3047225 crossref_primary_10_1109_ACCESS_2023_3258360 crossref_primary_10_3389_fnins_2022_929644 crossref_primary_10_1016_j_sysarc_2022_102567 crossref_primary_10_1109_TCSI_2022_3178474 crossref_primary_10_1109_ACCESS_2020_2976879 crossref_primary_10_1016_j_mejo_2021_105250 crossref_primary_10_1007_s11227_023_05255_7 crossref_primary_10_3390_aerospace7110159 crossref_primary_10_3390_electronics9010081 crossref_primary_10_1016_j_ijepes_2020_106721 crossref_primary_10_1587_elex_16_20190396 crossref_primary_10_1109_TVLSI_2021_3104145 crossref_primary_10_3390_electronics8111321 crossref_primary_10_1007_s11042_023_14537_4 crossref_primary_10_1109_TVLSI_2023_3307607 crossref_primary_10_1109_ACCESS_2020_3000444 crossref_primary_10_1109_JETCAS_2020_3022920 crossref_primary_10_1109_TC_2021_3092195 crossref_primary_10_1109_JIOT_2022_3179016 crossref_primary_10_1109_OJCAS_2021_3083332 crossref_primary_10_1137_23M1581819 crossref_primary_10_1007_s11265_022_01756_5 crossref_primary_10_1109_TVLSI_2022_3211665 crossref_primary_10_1145_3474597 crossref_primary_10_3390_electronics9111823 crossref_primary_10_1109_TVLSI_2021_3076081 crossref_primary_10_1016_j_microrel_2022_114498 crossref_primary_10_1109_TVLSI_2020_3002779 crossref_primary_10_1007_s11277_022_09991_6 crossref_primary_10_1016_j_micpro_2021_104242
Cites_doi	10.1109/TNNLS.2018.2852335 10.1109/ISCAS.2017.8050809 10.1109/GlobalSIP.2017.8309067 10.1145/2847263.2847265 10.1109/MICRO.2016.7783723 10.1109/MICRO.2016.7783720 10.1145/2647868.2654889 10.1109/78.492531 10.1007/s11263-015-0816-y 10.1109/CVPR.2015.7298965 10.1109/CVPR.2015.7298594 10.1145/3061639.3062244 10.1109/TCAD.2017.2705069 10.1109/5.726791 10.1145/3240765.3240801 10.1145/2684746.2689060 10.1109/JSSC.2016.2616357 10.1109/TVLSI.2018.2815603 10.1109/TVLSI.2017.2688340 10.1109/CVPR.2016.90
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
DBID	97E RIA RIE AAYXX CITATION 7SP 8FD L7M
DOI	10.1109/TVLSI.2019.2913958
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace
DatabaseTitle	CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1557-9999
EndPage	1885
ExternalDocumentID	10_1109_TVLSI_2019_2913958 8716697
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 61827804; 61836012; 61620106005; 61325003 funderid: 10.13039/501100001809
GroupedDBID	-~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TN5 VH1 AAYOK AAYXX CITATION RIG 7SP 8FD L7M
ID	FETCH-LOGICAL-c295t-f5928e318f171357121a304210f1dfa7e5ffc4ad6b5dbf7ff1af9291aa927063
IEDL.DBID	RIE
ISSN	1063-8210
IngestDate	Sun Jun 29 12:24:17 EDT 2025 Tue Jul 01 02:17:45 EDT 2025 Thu Apr 24 23:03:58 EDT 2025 Wed Aug 27 06:31:22 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	8
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c295t-f5928e318f171357121a304210f1dfa7e5ffc4ad6b5dbf7ff1af9291aa927063
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-9715-6957 0000-0002-7333-9975 0000-0003-3917-5273
PQID	2264437857
PQPubID	85424
PageCount	12
ParticipantIDs	proquest_journals_2264437857 crossref_primary_10_1109_TVLSI_2019_2913958 ieee_primary_8716697 crossref_citationtrail_10_1109_TVLSI_2019_2913958
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2019-08-01
PublicationDateYYYYMMDD	2019-08-01
PublicationDate_xml	– month: 08 year: 2019 text: 2019-08-01 day: 01
PublicationDecade	2010
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on very large scale integration (VLSI) systems
PublicationTitleAbbrev	TVLSI
PublicationYear	2019
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref34 ref12 ref15 ref14 ref30 ref33 ref11 gupta (ref18) 2015 corporation (ref6) 2019 ref32 corporation (ref31) 2019 ref2 ref16 conneau (ref4) 2016 li (ref10) 2016 migacz (ref19) 2017 simonyan (ref5) 2014 (ref26) 2016 ref24 ref23 ref25 ref22 ref28 ref8 ref7 ref9 ref3 song (ref21) 2018 krizhevsky (ref1) 2012 köster (ref20) 2017 han (ref17) 2015 krizhevsky (ref29) 2009 horowitz (ref27) 2014
References_xml	– start-page: 1737 year: 2015 ident: ref18 article-title: Deep learning with limited numerical precision publication-title: Proc Int Conf Mach Learn (ICML) – start-page: 816 year: 2018 ident: ref21 article-title: Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design publication-title: Proc 30th AAAI Conf Artif Intell – ident: ref14 doi: 10.1109/TNNLS.2018.2852335 – ident: ref13 doi: 10.1109/ISCAS.2017.8050809 – ident: ref23 doi: 10.1109/GlobalSIP.2017.8309067 – start-page: 1742 year: 2017 ident: ref20 article-title: Flexpoint: An adaptive numerical format for efficient training of deep neural networks publication-title: Proc Annu Conf Neural Inf Process Syst (NIPS) – year: 2014 ident: ref5 publication-title: Very Deep Convolutional Networks for Large-scale Image Recognition – ident: ref8 doi: 10.1145/2847263.2847265 – year: 2019 ident: ref6 publication-title: Nvidia cudnn - gpu accelerated deep learning – year: 2009 ident: ref29 article-title: Learning multiple layers of features from tiny images – ident: ref15 doi: 10.1109/MICRO.2016.7783723 – ident: ref11 doi: 10.1109/MICRO.2016.7783720 – year: 2016 ident: ref26 publication-title: Image – start-page: 1097 year: 2012 ident: ref1 article-title: ImageNet classification with deep convolutional neural networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref24 doi: 10.1145/2647868.2654889 – ident: ref30 doi: 10.1109/78.492531 – start-page: 1 year: 2016 ident: ref10 article-title: A high performance FPGA-based accelerator for large-scale convolutional neural networks publication-title: Proc Int Conf Field Program Logic and Appl – year: 2016 ident: ref4 publication-title: Very Deep Convolutional Networks for Text Classification – ident: ref22 doi: 10.1007/s11263-015-0816-y – ident: ref3 doi: 10.1109/CVPR.2015.7298965 – ident: ref2 doi: 10.1109/CVPR.2015.7298594 – ident: ref33 doi: 10.1145/3061639.3062244 – ident: ref9 doi: 10.1109/TCAD.2017.2705069 – year: 2015 ident: ref17 publication-title: Deep compression Compressing deep neural networks with pruning trained quantization and huffman coding – year: 2017 ident: ref19 publication-title: 8-bit inference with tensorrt – ident: ref28 doi: 10.1109/5.726791 – ident: ref34 doi: 10.1145/3240765.3240801 – year: 2019 ident: ref31 publication-title: CUDA Toolkit Documentation Floating Point and IEEE 754 – ident: ref7 doi: 10.1145/2684746.2689060 – ident: ref12 doi: 10.1109/JSSC.2016.2616357 – ident: ref32 doi: 10.1109/TVLSI.2018.2815603 – ident: ref16 doi: 10.1109/TVLSI.2017.2688340 – ident: ref25 doi: 10.1109/CVPR.2016.90 – start-page: 10 year: 2014 ident: ref27 article-title: 1.1 Computing's energy problem (and what we can do about it) publication-title: Proc IEEE Int Solid-State Circuits Conf Dig Tech Papers (ISSCC)
SSID	ssj0014490
Score	2.5532255
Snippet	Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	1874
SubjectTerms	Accelerators Artificial neural networks Block floating point (BFP) Computational modeling Computer memory Computer vision convolutional neural network (CNN) accelerator Embedded systems Feature maps Field programmable gate arrays field-programmable gate array (FPGA) Floating point arithmetic Hardware Mathematical model Memory management Neural networks Quantization (signal) Retraining Rounding Speech processing three-level parallel
Title	High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic
URI	https://ieeexplore.ieee.org/document/8716697 https://www.proquest.com/docview/2264437857
Volume	27
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLaAExx4DcRgoB64QbYma5fmuCHKQzBNYjxuVZomYhpsCLoLv5447QoChLhVatJGdlp_dj7bAIc6bGcy0JKoVDHroGifpMJvE8m0kIwbFrpmMNf9zvltcPkQPizAcZULo7V25DPdxEt3lp9N1QxDZS0E9x3BF2HROm5FrlZ1YhAEoqg80GmTyPox8wQZX7SGd1c3F8jiEk2GVTCxvfsXI-S6qvz4FTv7Eq_B9XxlBa1k3JzlaVO9fyva-N-lr8NqCTS9brEzNmBBTzZh5Uv5wRoMkORBBp-pA148OOuSnjVsmXfS73tdpaxVcgfx3v0of_R61vSNSfw0lciWJoPpaJLbV9hbz5gMuQXD-HR4ck7KDgtEMRHmxISCRRgFNRRb9XHKqMT4BvUNzYzkOjRGBTLrpGGWGm4MlcbiKSqlYNxKehuWJtOJ3gFPUCUt0rTgU1hMJqNI-IHiaaSDLLMeYVoHOpd4osrq49gE4ylxXogvEqelBLWUlFqqw1E156WovfHn6BqKvRpZSrwOjblik_LzfEswezho8yjku7_P2oNlfHbB9GvAUv460_sWfeTpgdt2H6UE1E0
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3NbhMxEB6VcgAO5acgQgvsAU7I6dpZx-sDh7QQEppGkQjQm-X12qJqm1R0o6q8Cq_CwzHj3YQKELdK3Fbyn9bzyTNjfzMD8MLLTmkzb5krnEAHxaes0GmHWeG1FSoIGYvBHIy7g4_Z-0N5uAbfV7Ew3vtIPvNt-oxv-eXcLeiqbIeM-65WDYVy319eoIN2_nr4BqX5Uoj-2-negDU1BJgTWlYsSC1yuucLnIrRKS64JQ-ep4GXwSovQ3CZLbuFLIugQuA2oMXArdVCofrGaW_ATTQzpKiDw1ZPFFmm61QH3Q7Lcb5lRE6qd6afRh-GRBvTbUFpN6me_BWtF8u4_HH2R4XWvws_lltR81iO24uqaLtvv2WJ_E_36h5sNIZ00quRfx_W_OwB3LmSXnETJkRiYZNfoRFJf_Kux3ZRcZfJ3nic9JxDrRuJBsnno-pLsouq_Zj1T-aW2OBsMj-aVbgENp1SsOdDmF7HHz2C9dl85h9DormzaEmjca0RDDbPdZo5VeQ-K0v0eIsW8KWAjWuyq1ORjxMTvaxUmwgKQ6AwDSha8Go15qzOLfLP3psk5VXPRsAt2F7iyDTHz7mh6Oiso3Kpnvx91HO4NZgejMxoON7fgtu0Ts1q3Ib16uvCP0VLqyqeRcQnYK4ZNT8Baccw3w
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=High-Performance+FPGA-Based+CNN+Accelerator+With+Block-Floating-Point+Arithmetic&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Lian%2C+Xiaocong&rft.au=Liu%2C+Zhenyu&rft.au=Song%2C+Zhourui&rft.au=Dai%2C+Jiwu&rft.date=2019-08-01&rft.pub=IEEE&rft.issn=1063-8210&rft.volume=27&rft.issue=8&rft.spage=1874&rft.epage=1885&rft_id=info:doi/10.1109%2FTVLSI.2019.2913958&rft.externalDocID=8716697
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon