High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic
Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (...
Saved in:
Published in | IEEE transactions on very large scale integration (VLSI) systems Vol. 27; no. 8; pp. 1874 - 1885 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.08.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (BFP) arithmetic is adopted in our accelerator for efficient inference of deep neural networks in this paper. The feature maps and model parameters are represented in 16-bit and 8-bit formats, respectively, in the off-chip memory, which can reduce memory and off-chip bandwidth requirements by 50% and 75% compared to the 32-bit FP counterpart. The proposed 8-bit BFP arithmetic with optimized rounding and shifting-operation-based quantization schemes improves the energy and hardware efficiency by three times. One CNN model can be deployed in our accelerator without retraining at the cost of an accuracy loss of not more than 0.12%. The proposed reconfigurable accelerator with three parallelism dimensions, ping-pong off-chip DDR3 memory access, and an optimized on-chip buffer group is implemented on the Xilinx VC709 evaluation board. Our accelerator achieves a performance of 760.83 GOP/s and 82.88 GOP/s/W under a 200-MHz working frequency, significantly outperforming previous accelerators. |
---|---|
AbstractList | Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying the large-scale CNN model in the embedded system is subject to the constraints of computation and memory. An optimized block-floating-point (BFP) arithmetic is adopted in our accelerator for efficient inference of deep neural networks in this paper. The feature maps and model parameters are represented in 16-bit and 8-bit formats, respectively, in the off-chip memory, which can reduce memory and off-chip bandwidth requirements by 50% and 75% compared to the 32-bit FP counterpart. The proposed 8-bit BFP arithmetic with optimized rounding and shifting-operation-based quantization schemes improves the energy and hardware efficiency by three times. One CNN model can be deployed in our accelerator without retraining at the cost of an accuracy loss of not more than 0.12%. The proposed reconfigurable accelerator with three parallelism dimensions, ping-pong off-chip DDR3 memory access, and an optimized on-chip buffer group is implemented on the Xilinx VC709 evaluation board. Our accelerator achieves a performance of 760.83 GOP/s and 82.88 GOP/s/W under a 200-MHz working frequency, significantly outperforming previous accelerators. |
Author | Dai, Jiwu Liu, Zhenyu Zhou, Wei Song, Zhourui Ji, Xiangyang Lian, Xiaocong |
Author_xml | – sequence: 1 givenname: Xiaocong orcidid: 0000-0003-3917-5273 surname: Lian fullname: Lian, Xiaocong email: lian900625@tsinghua.edu.cn organization: Department of Automation, Tsinghua University, Beijing, China – sequence: 2 givenname: Zhenyu surname: Liu fullname: Liu, Zhenyu email: liuzhenyu73@mail.tsinghua.edu.cn organization: Tsinghua National Laboratory for Information Science and Technology, Research Institute of Information Technology, Tsinghua University, Beijing, China – sequence: 3 givenname: Zhourui surname: Song fullname: Song, Zhourui email: zrsong99@163.com organization: School of Cyberspace Security, Beijing University of Posts and Telecommunications (BUPT), Beijing, China – sequence: 4 givenname: Jiwu surname: Dai fullname: Dai, Jiwu email: mr_dai333@mail.nwpu.edu.cn organization: School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China – sequence: 5 givenname: Wei orcidid: 0000-0001-9715-6957 surname: Zhou fullname: Zhou, Wei email: zhouwei@nwpu.edu.cn organization: School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China – sequence: 6 givenname: Xiangyang orcidid: 0000-0002-7333-9975 surname: Ji fullname: Ji, Xiangyang email: xyji@tsinghua.edu.cn organization: Department of Automation, Tsinghua University, Beijing, China |
BookMark | eNp9UE1PAjEQbQwmAvoH9LKJ52Knu91uj0DkIyFIItHjppQWissWu-Xgv7cI8eDBucwk8968ea-DWrWrNUL3QHoARDwt32av0x4lIHpUQCpYcYXawBjHIlYrziRPcUGB3KBO0-wIgSwTpI0WE7vZ4oX2xvm9rJVORotxHw9ko9fJcD5P-krpSnsZnE_ebdgmg8qpDzyqnAy23uCFs3VI-j6u9jpYdYuujawafXfpXbQcPS-HEzx7GU-H_RlWVLCADRO00CkUBjikjAMFmZIsPmhgbSTXzBiVyXW-YuuV4caANCJak1JQHr100eP57MG7z6NuQrlzR19HxZLSPMtSXjAeUfSMUt41jdemPHi7l_6rBFKegit_gitPwZWX4CKp-ENSNkSzrg5e2up_6sOZarXWv1oFhzwXPP0Gs6B8pQ |
CODEN | IEVSE9 |
CitedBy_id | crossref_primary_10_1007_s11265_023_01901_8 crossref_primary_10_1145_3546182 crossref_primary_10_1109_TNNLS_2021_3116302 crossref_primary_10_1016_j_micpro_2022_104549 crossref_primary_10_1109_ACCESS_2021_3049299 crossref_primary_10_3390_jlpea10010001 crossref_primary_10_1109_ACCESS_2023_3235866 crossref_primary_10_1142_S0218126623502183 crossref_primary_10_1145_3519598 crossref_primary_10_3390_app11041519 crossref_primary_10_3390_electronics8121527 crossref_primary_10_3390_s22031230 crossref_primary_10_36074_grail_of_science_14_04_2023_039 crossref_primary_10_1109_TVLSI_2023_3332170 crossref_primary_10_1109_JOE_2019_2950974 crossref_primary_10_3390_mi14030531 crossref_primary_10_1145_3705729 crossref_primary_10_3390_agriculture12111849 crossref_primary_10_1007_s11227_021_03909_y crossref_primary_10_1007_s11554_021_01140_9 crossref_primary_10_1049_csy2_12013 crossref_primary_10_61186_itrc_15_2_12 crossref_primary_10_1109_TCAD_2024_3435996 crossref_primary_10_3390_electronics13071217 crossref_primary_10_1007_s13369_022_06931_1 crossref_primary_10_3390_asi7010010 crossref_primary_10_3390_electronics10060681 crossref_primary_10_1111_coin_12481 crossref_primary_10_1109_TCSI_2023_3300657 crossref_primary_10_1109_TII_2022_3223222 crossref_primary_10_1177_00405175211048156 crossref_primary_10_1109_ACCESS_2022_3229767 crossref_primary_10_1007_s00530_024_01528_0 crossref_primary_10_1049_iet_cds_2019_0225 crossref_primary_10_1002_cpe_6198 crossref_primary_10_1007_s11227_022_04400_y crossref_primary_10_1109_TVLSI_2022_3197282 crossref_primary_10_1109_OJCAS_2020_3047225 crossref_primary_10_1109_ACCESS_2023_3258360 crossref_primary_10_3389_fnins_2022_929644 crossref_primary_10_1016_j_sysarc_2022_102567 crossref_primary_10_1109_TCSI_2022_3178474 crossref_primary_10_1109_ACCESS_2020_2976879 crossref_primary_10_1016_j_mejo_2021_105250 crossref_primary_10_1007_s11227_023_05255_7 crossref_primary_10_3390_aerospace7110159 crossref_primary_10_3390_electronics9010081 crossref_primary_10_1016_j_ijepes_2020_106721 crossref_primary_10_1587_elex_16_20190396 crossref_primary_10_1109_TVLSI_2021_3104145 crossref_primary_10_3390_electronics8111321 crossref_primary_10_1007_s11042_023_14537_4 crossref_primary_10_1109_TVLSI_2023_3307607 crossref_primary_10_1109_ACCESS_2020_3000444 crossref_primary_10_1109_JETCAS_2020_3022920 crossref_primary_10_1109_TC_2021_3092195 crossref_primary_10_1109_JIOT_2022_3179016 crossref_primary_10_1109_OJCAS_2021_3083332 crossref_primary_10_1137_23M1581819 crossref_primary_10_1007_s11265_022_01756_5 crossref_primary_10_1109_TVLSI_2022_3211665 crossref_primary_10_1145_3474597 crossref_primary_10_3390_electronics9111823 crossref_primary_10_1109_TVLSI_2021_3076081 crossref_primary_10_1016_j_microrel_2022_114498 crossref_primary_10_1109_TVLSI_2020_3002779 crossref_primary_10_1007_s11277_022_09991_6 crossref_primary_10_1016_j_micpro_2021_104242 |
Cites_doi | 10.1109/TNNLS.2018.2852335 10.1109/ISCAS.2017.8050809 10.1109/GlobalSIP.2017.8309067 10.1145/2847263.2847265 10.1109/MICRO.2016.7783723 10.1109/MICRO.2016.7783720 10.1145/2647868.2654889 10.1109/78.492531 10.1007/s11263-015-0816-y 10.1109/CVPR.2015.7298965 10.1109/CVPR.2015.7298594 10.1145/3061639.3062244 10.1109/TCAD.2017.2705069 10.1109/5.726791 10.1145/3240765.3240801 10.1145/2684746.2689060 10.1109/JSSC.2016.2616357 10.1109/TVLSI.2018.2815603 10.1109/TVLSI.2017.2688340 10.1109/CVPR.2016.90 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
DBID | 97E RIA RIE AAYXX CITATION 7SP 8FD L7M |
DOI | 10.1109/TVLSI.2019.2913958 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1557-9999 |
EndPage | 1885 |
ExternalDocumentID | 10_1109_TVLSI_2019_2913958 8716697 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61827804; 61836012; 61620106005; 61325003 funderid: 10.13039/501100001809 |
GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TN5 VH1 AAYOK AAYXX CITATION RIG 7SP 8FD L7M |
ID | FETCH-LOGICAL-c295t-f5928e318f171357121a304210f1dfa7e5ffc4ad6b5dbf7ff1af9291aa927063 |
IEDL.DBID | RIE |
ISSN | 1063-8210 |
IngestDate | Sun Jun 29 12:24:17 EDT 2025 Tue Jul 01 02:17:45 EDT 2025 Thu Apr 24 23:03:58 EDT 2025 Wed Aug 27 06:31:22 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 8 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c295t-f5928e318f171357121a304210f1dfa7e5ffc4ad6b5dbf7ff1af9291aa927063 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0001-9715-6957 0000-0002-7333-9975 0000-0003-3917-5273 |
PQID | 2264437857 |
PQPubID | 85424 |
PageCount | 12 |
ParticipantIDs | proquest_journals_2264437857 crossref_primary_10_1109_TVLSI_2019_2913958 ieee_primary_8716697 crossref_citationtrail_10_1109_TVLSI_2019_2913958 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2019-08-01 |
PublicationDateYYYYMMDD | 2019-08-01 |
PublicationDate_xml | – month: 08 year: 2019 text: 2019-08-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on very large scale integration (VLSI) systems |
PublicationTitleAbbrev | TVLSI |
PublicationYear | 2019 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref34 ref12 ref15 ref14 ref30 ref33 ref11 gupta (ref18) 2015 corporation (ref6) 2019 ref32 corporation (ref31) 2019 ref2 ref16 conneau (ref4) 2016 li (ref10) 2016 migacz (ref19) 2017 simonyan (ref5) 2014 (ref26) 2016 ref24 ref23 ref25 ref22 ref28 ref8 ref7 ref9 ref3 song (ref21) 2018 krizhevsky (ref1) 2012 köster (ref20) 2017 han (ref17) 2015 krizhevsky (ref29) 2009 horowitz (ref27) 2014 |
References_xml | – start-page: 1737 year: 2015 ident: ref18 article-title: Deep learning with limited numerical precision publication-title: Proc Int Conf Mach Learn (ICML) – start-page: 816 year: 2018 ident: ref21 article-title: Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design publication-title: Proc 30th AAAI Conf Artif Intell – ident: ref14 doi: 10.1109/TNNLS.2018.2852335 – ident: ref13 doi: 10.1109/ISCAS.2017.8050809 – ident: ref23 doi: 10.1109/GlobalSIP.2017.8309067 – start-page: 1742 year: 2017 ident: ref20 article-title: Flexpoint: An adaptive numerical format for efficient training of deep neural networks publication-title: Proc Annu Conf Neural Inf Process Syst (NIPS) – year: 2014 ident: ref5 publication-title: Very Deep Convolutional Networks for Large-scale Image Recognition – ident: ref8 doi: 10.1145/2847263.2847265 – year: 2019 ident: ref6 publication-title: Nvidia cudnn - gpu accelerated deep learning – year: 2009 ident: ref29 article-title: Learning multiple layers of features from tiny images – ident: ref15 doi: 10.1109/MICRO.2016.7783723 – ident: ref11 doi: 10.1109/MICRO.2016.7783720 – year: 2016 ident: ref26 publication-title: Image – start-page: 1097 year: 2012 ident: ref1 article-title: ImageNet classification with deep convolutional neural networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref24 doi: 10.1145/2647868.2654889 – ident: ref30 doi: 10.1109/78.492531 – start-page: 1 year: 2016 ident: ref10 article-title: A high performance FPGA-based accelerator for large-scale convolutional neural networks publication-title: Proc Int Conf Field Program Logic and Appl – year: 2016 ident: ref4 publication-title: Very Deep Convolutional Networks for Text Classification – ident: ref22 doi: 10.1007/s11263-015-0816-y – ident: ref3 doi: 10.1109/CVPR.2015.7298965 – ident: ref2 doi: 10.1109/CVPR.2015.7298594 – ident: ref33 doi: 10.1145/3061639.3062244 – ident: ref9 doi: 10.1109/TCAD.2017.2705069 – year: 2015 ident: ref17 publication-title: Deep compression Compressing deep neural networks with pruning trained quantization and huffman coding – year: 2017 ident: ref19 publication-title: 8-bit inference with tensorrt – ident: ref28 doi: 10.1109/5.726791 – ident: ref34 doi: 10.1145/3240765.3240801 – year: 2019 ident: ref31 publication-title: CUDA Toolkit Documentation Floating Point and IEEE 754 – ident: ref7 doi: 10.1145/2684746.2689060 – ident: ref12 doi: 10.1109/JSSC.2016.2616357 – ident: ref32 doi: 10.1109/TVLSI.2018.2815603 – ident: ref16 doi: 10.1109/TVLSI.2017.2688340 – ident: ref25 doi: 10.1109/CVPR.2016.90 – start-page: 10 year: 2014 ident: ref27 article-title: 1.1 Computing's energy problem (and what we can do about it) publication-title: Proc IEEE Int Solid-State Circuits Conf Dig Tech Papers (ISSCC) |
SSID | ssj0014490 |
Score | 2.5532255 |
Snippet | Convolutional neural networks (CNNs) are widely used and have achieved great success in computer vision and speech processing applications. However, deploying... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 1874 |
SubjectTerms | Accelerators Artificial neural networks Block floating point (BFP) Computational modeling Computer memory Computer vision convolutional neural network (CNN) accelerator Embedded systems Feature maps Field programmable gate arrays field-programmable gate array (FPGA) Floating point arithmetic Hardware Mathematical model Memory management Neural networks Quantization (signal) Retraining Rounding Speech processing three-level parallel |
Title | High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic |
URI | https://ieeexplore.ieee.org/document/8716697 https://www.proquest.com/docview/2264437857 |
Volume | 27 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLaAExx4DcRgoB64QbYma5fmuCHKQzBNYjxuVZomYhpsCLoLv5447QoChLhVatJGdlp_dj7bAIc6bGcy0JKoVDHroGifpMJvE8m0kIwbFrpmMNf9zvltcPkQPizAcZULo7V25DPdxEt3lp9N1QxDZS0E9x3BF2HROm5FrlZ1YhAEoqg80GmTyPox8wQZX7SGd1c3F8jiEk2GVTCxvfsXI-S6qvz4FTv7Eq_B9XxlBa1k3JzlaVO9fyva-N-lr8NqCTS9brEzNmBBTzZh5Uv5wRoMkORBBp-pA148OOuSnjVsmXfS73tdpaxVcgfx3v0of_R61vSNSfw0lciWJoPpaJLbV9hbz5gMuQXD-HR4ck7KDgtEMRHmxISCRRgFNRRb9XHKqMT4BvUNzYzkOjRGBTLrpGGWGm4MlcbiKSqlYNxKehuWJtOJ3gFPUCUt0rTgU1hMJqNI-IHiaaSDLLMeYVoHOpd4osrq49gE4ylxXogvEqelBLWUlFqqw1E156WovfHn6BqKvRpZSrwOjblik_LzfEswezho8yjku7_P2oNlfHbB9GvAUv460_sWfeTpgdt2H6UE1E0 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3NbhMxEB6VcgAO5acgQgvsAU7I6dpZx-sDh7QQEppGkQjQm-X12qJqm1R0o6q8Cq_CwzHj3YQKELdK3Fbyn9bzyTNjfzMD8MLLTmkzb5krnEAHxaes0GmHWeG1FSoIGYvBHIy7g4_Z-0N5uAbfV7Ew3vtIPvNt-oxv-eXcLeiqbIeM-65WDYVy319eoIN2_nr4BqX5Uoj-2-negDU1BJgTWlYsSC1yuucLnIrRKS64JQ-ep4GXwSovQ3CZLbuFLIugQuA2oMXArdVCofrGaW_ATTQzpKiDw1ZPFFmm61QH3Q7Lcb5lRE6qd6afRh-GRBvTbUFpN6me_BWtF8u4_HH2R4XWvws_lltR81iO24uqaLtvv2WJ_E_36h5sNIZ00quRfx_W_OwB3LmSXnETJkRiYZNfoRFJf_Kux3ZRcZfJ3nic9JxDrRuJBsnno-pLsouq_Zj1T-aW2OBsMj-aVbgENp1SsOdDmF7HHz2C9dl85h9DormzaEmjca0RDDbPdZo5VeQ-K0v0eIsW8KWAjWuyq1ORjxMTvaxUmwgKQ6AwDSha8Go15qzOLfLP3psk5VXPRsAt2F7iyDTHz7mh6Oiso3Kpnvx91HO4NZgejMxoON7fgtu0Ts1q3Ib16uvCP0VLqyqeRcQnYK4ZNT8Baccw3w |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=High-Performance+FPGA-Based+CNN+Accelerator+With+Block-Floating-Point+Arithmetic&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Lian%2C+Xiaocong&rft.au=Liu%2C+Zhenyu&rft.au=Song%2C+Zhourui&rft.au=Dai%2C+Jiwu&rft.date=2019-08-01&rft.pub=IEEE&rft.issn=1063-8210&rft.volume=27&rft.issue=8&rft.spage=1874&rft.epage=1885&rft_id=info:doi/10.1109%2FTVLSI.2019.2913958&rft.externalDocID=8716697 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon |