FloatSD: A New Weight Representation and Associated Update Method for Efficient Convolutional Neural Network Training

In this paper, we propose a floating-point signed digit (FloatSD) format for convolutional neural network (CNN) weight representation and its update method during CNN training. The number of non-zero digits in a weight can be as few as two during the forward and backward passes of the CNN training,...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal on emerging and selected topics in circuits and systems Vol. 9; no. 2; pp. 267 - 279
Main Authors	Lin, Po-Chen, Sun, Mu-Kai, Kung, Chuking, Chiueh, Tzi-Dar
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.06.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks CIFAR-10 Complexity Computational modeling Computer memory Convolution Convolutional neural network (CNN) Convolutional neural networks Digits Floating point arithmetic ImageNet low-complexity training MNIST Multiplication Neural networks Neurons Quantization (signal) Representations Training Weight weight quantization
Online Access	Get full text

Cover

Loading…

Abstract	In this paper, we propose a floating-point signed digit (FloatSD) format for convolutional neural network (CNN) weight representation and its update method during CNN training. The number of non-zero digits in a weight can be as few as two during the forward and backward passes of the CNN training, reducing the convolution multiplication to addition of two shifted multiplicands (partial products). Furthermore, the mantissa field and the exponent field of neuron activations and gradients during training are also quantized, leading to floating-point numbers represented by eight bits. We tested the FloatSD method using three popular CNN applications, namely, MNIST, CIFAR-10, and ImageNet. These three CNNs were trained from scratch using the conventional 32-bit floating-point arithmetic and the FloatSD weight representation, 8-bit floating-point numbers for activations and gradients, and half-precision 16-bit floating-point accumulation. We obtained FloatSD accuracy results very close to or even better than those trained in 32-bit floating-point arithmetic. Finally, the proposed method not only significantly reduces the computational complexity for CNN training but also achieves memory capacity and bandwidth saving of about three quarters, demonstrating its effectiveness in the low-complexity implementation of CNN training.
AbstractList	In this paper, we propose a floating-point signed digit (FloatSD) format for convolutional neural network (CNN) weight representation and its update method during CNN training. The number of non-zero digits in a weight can be as few as two during the forward and backward passes of the CNN training, reducing the convolution multiplication to addition of two shifted multiplicands (partial products). Furthermore, the mantissa field and the exponent field of neuron activations and gradients during training are also quantized, leading to floating-point numbers represented by eight bits. We tested the FloatSD method using three popular CNN applications, namely, MNIST, CIFAR-10, and ImageNet. These three CNNs were trained from scratch using the conventional 32-bit floating-point arithmetic and the FloatSD weight representation, 8-bit floating-point numbers for activations and gradients, and half-precision 16-bit floating-point accumulation. We obtained FloatSD accuracy results very close to or even better than those trained in 32-bit floating-point arithmetic. Finally, the proposed method not only significantly reduces the computational complexity for CNN training but also achieves memory capacity and bandwidth saving of about three quarters, demonstrating its effectiveness in the low-complexity implementation of CNN training.
Author	Kung, Chuking Sun, Mu-Kai Chiueh, Tzi-Dar Lin, Po-Chen
Author_xml	– sequence: 1 givenname: Po-Chen orcidid: 0000-0001-7838-3150 surname: Lin fullname: Lin, Po-Chen email: alexlin5411@gmail.com organization: Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan – sequence: 2 givenname: Mu-Kai orcidid: 0000-0002-7165-8112 surname: Sun fullname: Sun, Mu-Kai organization: Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan – sequence: 3 givenname: Chuking orcidid: 0000-0001-7268-5158 surname: Kung fullname: Kung, Chuking organization: Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan – sequence: 4 givenname: Tzi-Dar orcidid: 0000-0003-0851-6629 surname: Chiueh fullname: Chiueh, Tzi-Dar email: chiueh@ntu.edu.tw organization: Department of Electrical Engineering Engineering and the Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan
BookMark	eNo9kEtPAjEUhRuDiYj8AjZNXIPtPNqpO4LgI6iJQFxOOtM7UByn2HYk_nuLGNo0p4vzndx7LlGnMQ0gNKBkRCkRN0_T5WS8GEWEilEkKBVCnKFuRFM2jGOWdk7_lF-gvnNbEk7KKEuSLmpntZF-cXeLx_gF9vgd9Hrj8RvsLDhovPTaNFg2Co-dM6WWHhRe7VRQ_Ax-YxSujMXTqtKlDn48Mc23qdsDJusQ2do_8XtjP_DSSt3oZn2FzitZO-j_aw-tZmGLh-H89f5xMp4Py0hwPywLXjBSFBCpNFMsZUpBGV5BieIsgZKLDCDLIiITJYngXBUJDwThiSy4iHvo-pi7s-arBefzrWltGMzlURQLxlKSxMEVH12lNc5ZqPKd1Z_S_uSU5IeK82PF-aHi_L_iQA2OlAaAE5ExEWfh_gLjmns5
CODEN	IJESLY
CitedBy_id	crossref_primary_10_1016_j_jestch_2022_101153 crossref_primary_10_4018_JOEUC_289222 crossref_primary_10_1080_08839514_2022_2137650 crossref_primary_10_1080_0952813X_2022_2092558 crossref_primary_10_1109_TNNLS_2021_3082304 crossref_primary_10_1109_TCSI_2020_2973537 crossref_primary_10_1089_neu_2020_7281 crossref_primary_10_1109_JETCAS_2021_3116044
Cites_doi	10.1162/neco.1997.9.5.1093 10.1109/SIPS.2000.886740 10.1109/MSP.2012.2205597 10.1145/2647868.2654889 10.1109/TCSII.2005.853895 10.1109/5.726791 10.1109/CVPR.2016.90 10.1109/ICASSP.2018.8461702 10.1038/nature16961
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/JETCAS.2019.2911999
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2156-3365
EndPage	279
ExternalDocumentID	10_1109_JETCAS_2019_2911999 8693838
Genre	orig-research
GrantInformation_xml	– fundername: Ministry of Science and Technology, Taiwan grantid: MOST 106-2221-E-002-238 funderid: 10.13039/501100004663
GroupedDBID	0R~ 4.4 5VS 6IK 97E AAJGR AASAJ ABQJQ ABVLG AENEX AKJIK ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD HZ~ IFIPE IPLJI JAVBF M43 O9- OCL PQQKQ RIA RIE RIG RNS AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c297t-cb7b60bbe2d58d656ddecddeb10d764ec798ee8820a4da0977db4760b074ab793
IEDL.DBID	RIE
ISSN	2156-3357
IngestDate	Fri Sep 13 00:47:51 EDT 2024 Fri Aug 23 00:35:02 EDT 2024 Wed Jun 26 19:28:11 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	2
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c297t-cb7b60bbe2d58d656ddecddeb10d764ec798ee8820a4da0977db4760b074ab793
ORCID	0000-0003-0851-6629 0000-0002-7165-8112 0000-0001-7838-3150 0000-0001-7268-5158
PQID	2239665043
PQPubID	2040416
PageCount	13
ParticipantIDs	crossref_primary_10_1109_JETCAS_2019_2911999 ieee_primary_8693838 proquest_journals_2239665043
PublicationCentury	2000
PublicationDate	2019-06-01
PublicationDateYYYYMMDD	2019-06-01
PublicationDate_xml	– month: 06 year: 2019 text: 2019-06-01 day: 01
PublicationDecade	2010
PublicationPlace	Piscataway
PublicationPlace_xml	– name: Piscataway
PublicationTitle	IEEE journal on emerging and selected topics in circuits and systems
PublicationTitleAbbrev	JETCAS
PublicationYear	2019
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref34 rastegari (ref13) 2016 neelakantan (ref33) 2015 ref32 köster (ref16) 2017 li (ref18) 2016 han (ref7) 2015 ref2 wang (ref20) 2018 sakr (ref11) 2017 micikevicius (ref17) 2018 wu (ref19) 2018 srivastava (ref23) 2014; 15 (ref31) 2016 zhou (ref15) 2016 hubara (ref14) 2016 glorot (ref26) 2010 han (ref8) 2015 lin (ref9) 2016 ref24 silver (ref4) 2016; 529 denton (ref6) 2014 ref22 ref21 (ref30) 2019 ref27 (ref28) 2019 ref29 jung (ref10) 2018 gupta (ref25) 2015 krizhevsky (ref1) 2012 lin (ref5) 2016 courbariaux (ref12) 2015 bojarski (ref3) 2016
References_xml	– ident: ref22 doi: 10.1162/neco.1997.9.5.1093 – ident: ref21 doi: 10.1109/SIPS.2000.886740 – year: 2015 ident: ref25 publication-title: Deep Learning with Limited Numerical Precision contributor: fullname: gupta – year: 2015 ident: ref33 publication-title: Adding gradient noise improves learning for very deep networks contributor: fullname: neelakantan – year: 2016 ident: ref14 publication-title: Quantized neural networks Training neural networks with low precision weights and activation contributor: fullname: hubara – ident: ref2 doi: 10.1109/MSP.2012.2205597 – start-page: 1742 year: 2017 ident: ref16 article-title: Flexpoint: An adaptive numerical format for efficient training of deep neural networks publication-title: Proc Int Conf Neural Inf Process Syst (NIPS) contributor: fullname: köster – year: 2016 ident: ref3 publication-title: End to End Learning for Self-Driving Cars contributor: fullname: bojarski – ident: ref27 doi: 10.1145/2647868.2654889 – year: 2016 ident: ref31 publication-title: Image – year: 2016 ident: ref5 publication-title: Fixed point quantization of deep convolutional networks contributor: fullname: lin – start-page: 1135 year: 2015 ident: ref7 article-title: Learning both weights and connections for efficient neural network publication-title: Proc Int Conf Neural Inf Process Syst (NIPS) contributor: fullname: han – year: 2018 ident: ref10 publication-title: Learning to quantize deep networks by optimizing quantization intervals with task loss contributor: fullname: jung – year: 2015 ident: ref12 publication-title: BinaryConnect Training deep neural networks with binary weights during propagations contributor: fullname: courbariaux – start-page: 1097 year: 2012 ident: ref1 article-title: ImageNet classification with deep convolutional neural networks publication-title: Proc Adv Neural Inf Process Syst contributor: fullname: krizhevsky – start-page: 3007 year: 2017 ident: ref11 article-title: Analytical guarantees on numerical precision of deep neural networks publication-title: Proc 34th Int Conf Mach Learn contributor: fullname: sakr – year: 2014 ident: ref6 publication-title: Exploiting linear structure within convolutional networks for efficient evaluation contributor: fullname: denton – ident: ref24 doi: 10.1109/TCSII.2005.853895 – start-page: 249 year: 2010 ident: ref26 article-title: Understanding the difficulty of training deep feedforward neural networks publication-title: Proc 13th Int Conf Artif Intell Statist contributor: fullname: glorot – year: 2019 ident: ref28 publication-title: The mnist – year: 2016 ident: ref13 publication-title: Xnor-net Imagenet classification using binary convolutional neural networks contributor: fullname: rastegari – year: 2015 ident: ref8 publication-title: Deep compression Compressing deep neural networks with pruning trained quantization and huffman coding contributor: fullname: han – year: 2018 ident: ref17 publication-title: Mixed Precision Training contributor: fullname: micikevicius – ident: ref29 doi: 10.1109/5.726791 – year: 2019 ident: ref30 publication-title: Cifar-10 – year: 2016 ident: ref15 publication-title: DoReFa-Net Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients contributor: fullname: zhou – ident: ref32 doi: 10.1109/CVPR.2016.90 – year: 2016 ident: ref18 publication-title: Ternary Weight Networks contributor: fullname: li – volume: 15 start-page: 1929 year: 2014 ident: ref23 article-title: Dropout: A simple way to prevent neural networks from overfitting publication-title: J Mach Learn Res contributor: fullname: srivastava – year: 2018 ident: ref19 publication-title: Training and inference with integers in deep neural networks contributor: fullname: wu – start-page: 7685 year: 2018 ident: ref20 article-title: Training deep neural networks with 8-bit floating point numbers publication-title: Proc Adv Neural Inf Process Syst contributor: fullname: wang – ident: ref34 doi: 10.1109/ICASSP.2018.8461702 – volume: 529 start-page: 484 year: 2016 ident: ref4 article-title: Mastering the game of Go with deep neural networks and tree search publication-title: Nature doi: 10.1038/nature16961 contributor: fullname: silver – year: 2016 ident: ref9 publication-title: Overcoming challenges in fixed point training of deep convolutional networks contributor: fullname: lin
SSID	ssj0000561644
Score	2.2457116
Snippet	In this paper, we propose a floating-point signed digit (FloatSD) format for convolutional neural network (CNN) weight representation and its update method...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Publisher
StartPage	267
SubjectTerms	Artificial neural networks CIFAR-10 Complexity Computational modeling Computer memory Convolution Convolutional neural network (CNN) Convolutional neural networks Digits Floating point arithmetic ImageNet low-complexity training MNIST Multiplication Neural networks Neurons Quantization (signal) Representations Training Weight weight quantization
Title	FloatSD: A New Weight Representation and Associated Update Method for Efficient Convolutional Neural Network Training
URI	https://ieeexplore.ieee.org/document/8693838 https://www.proquest.com/docview/2239665043/abstract/
Volume	9
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VTjDwRhQK8sDYlDzcJGarClVVqQyUim5RbF8XUFJBysCv5-y4BQEDQ5QMdmL5nLvvzufvAK6CeBEFC7P1r9LY45j0PMHJS1E6jVATZNe23NvkPh7N-HjemzegszkLg4g2-Qy75tHu5etSrUyo7DqNBTlU6RZspX5Yn9XaxFMMEo5t7VYyYrEXRb3EkQwFvrge3z0O-lOTySW6If3fwnK9fhkiW1nllzq2Nma4B5P16OrUkufuqpJd9fGDuPG_w9-HXQc2Wb9eHQfQwOIQdr5REB7BavhS5tX09ob1GSk89mRDpezBJsi6c0kFywvN1oJEzWZLEydgE1t9mhHsZXeWiYLas0FZvLvlTJ823B_2ZpPN2aMrSHEMsyHN1chzpRg8FYqk8pRMZOxLiaHupZowIGlFRZcMfJ3EHFUiUkRC637Ode4TqNSSJ9SDEEouSQecQLMoCzwFJjiZZR0ITKMFj6QSPA80hhK12UNVogWdtVyyZc24kVlPxRdZLcbMiDFzYmzBkZnpTVM3yS1or2WZub_yLSMoRN6d4Ww7-7vXOWybd9epYG1oVq8rvCDQUclLu9o-AfrC1PA
link.rule.ids	315,786,790,802,27957,27958,55109
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED4VGICBN6I8PTA2JWncJGarClUptANtBVsU29cFlCJIGfj1nB23IGBgiJLBVhKffffd-fwdwHkQTcJgYrb-VRJ5HOOmJzh5KUonIWqC7NqWe-sPou6Y9x6bjxWoLc7CIKJNPsO6ebR7-XqqZiZUdpFEghyqZAlWyM77ojyttYioGCwc2eqtZMYiLwybsaMZorYXvetRuzU0uVyi3qAVLizb65cpsrVVfilka2U6m9Cff1-ZXPJUnxWyrj5-UDf-9we2YMPBTdYq58c2VDDfgfVvJIS7MOs8T7NieHXJWoxUHnuwwVJ2b1Nk3cmknGW5ZnNRombjFxMpYH1bf5oR8GXXlouC2rP2NH93E5pebdg_7M2mm7ORK0mxB-MOjVXXc8UYPNUQceEpGcvIlxIbuploQoGkFxVdMvB1HHFUsUgQCa_7GdeZT7BSSx5TD8IomSQtsA_L-TTHA2CCk2HWgcAknPBQKsGzQGNDoja7qEpUoTaXS_pScm6k1lfxRVqKMTViTJ0Yq7BrRnrR1A1yFY7nskzdunxLCQyRf2dY2w7_7nUGq91R_y69uxncHsGaeU-ZGHYMy8XrDE8IghTy1M68T14x2EY
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=FloatSD%3A+A+New+Weight+Representation+and+Associated+Update+Method+for+Efficient+Convolutional+Neural+Network+Training&rft.jtitle=IEEE+journal+on+emerging+and+selected+topics+in+circuits+and+systems&rft.au=Lin%2C+Po-Chen&rft.au=Sun%2C+Mu-Kai&rft.au=Kung%2C+Chuking&rft.au=Chiueh%2C+Tzi-Dar&rft.date=2019-06-01&rft.issn=2156-3357&rft.eissn=2156-3365&rft.volume=9&rft.issue=2&rft.spage=267&rft.epage=279&rft_id=info:doi/10.1109%2FJETCAS.2019.2911999&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_JETCAS_2019_2911999
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2156-3357&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2156-3357&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2156-3357&client=summon