FloatSD: A New Weight Representation and Associated Update Method for Efficient Convolutional Neural Network Training

In this paper, we propose a floating-point signed digit (FloatSD) format for convolutional neural network (CNN) weight representation and its update method during CNN training. The number of non-zero digits in a weight can be as few as two during the forward and backward passes of the CNN training,...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal on emerging and selected topics in circuits and systems Vol. 9; no. 2; pp. 267 - 279
Main Authors Lin, Po-Chen, Sun, Mu-Kai, Kung, Chuking, Chiueh, Tzi-Dar
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.06.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In this paper, we propose a floating-point signed digit (FloatSD) format for convolutional neural network (CNN) weight representation and its update method during CNN training. The number of non-zero digits in a weight can be as few as two during the forward and backward passes of the CNN training, reducing the convolution multiplication to addition of two shifted multiplicands (partial products). Furthermore, the mantissa field and the exponent field of neuron activations and gradients during training are also quantized, leading to floating-point numbers represented by eight bits. We tested the FloatSD method using three popular CNN applications, namely, MNIST, CIFAR-10, and ImageNet. These three CNNs were trained from scratch using the conventional 32-bit floating-point arithmetic and the FloatSD weight representation, 8-bit floating-point numbers for activations and gradients, and half-precision 16-bit floating-point accumulation. We obtained FloatSD accuracy results very close to or even better than those trained in 32-bit floating-point arithmetic. Finally, the proposed method not only significantly reduces the computational complexity for CNN training but also achieves memory capacity and bandwidth saving of about three quarters, demonstrating its effectiveness in the low-complexity implementation of CNN training.
AbstractList In this paper, we propose a floating-point signed digit (FloatSD) format for convolutional neural network (CNN) weight representation and its update method during CNN training. The number of non-zero digits in a weight can be as few as two during the forward and backward passes of the CNN training, reducing the convolution multiplication to addition of two shifted multiplicands (partial products). Furthermore, the mantissa field and the exponent field of neuron activations and gradients during training are also quantized, leading to floating-point numbers represented by eight bits. We tested the FloatSD method using three popular CNN applications, namely, MNIST, CIFAR-10, and ImageNet. These three CNNs were trained from scratch using the conventional 32-bit floating-point arithmetic and the FloatSD weight representation, 8-bit floating-point numbers for activations and gradients, and half-precision 16-bit floating-point accumulation. We obtained FloatSD accuracy results very close to or even better than those trained in 32-bit floating-point arithmetic. Finally, the proposed method not only significantly reduces the computational complexity for CNN training but also achieves memory capacity and bandwidth saving of about three quarters, demonstrating its effectiveness in the low-complexity implementation of CNN training.
Author Kung, Chuking
Sun, Mu-Kai
Chiueh, Tzi-Dar
Lin, Po-Chen
Author_xml – sequence: 1
  givenname: Po-Chen
  orcidid: 0000-0001-7838-3150
  surname: Lin
  fullname: Lin, Po-Chen
  email: alexlin5411@gmail.com
  organization: Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan
– sequence: 2
  givenname: Mu-Kai
  orcidid: 0000-0002-7165-8112
  surname: Sun
  fullname: Sun, Mu-Kai
  organization: Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan
– sequence: 3
  givenname: Chuking
  orcidid: 0000-0001-7268-5158
  surname: Kung
  fullname: Kung, Chuking
  organization: Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan
– sequence: 4
  givenname: Tzi-Dar
  orcidid: 0000-0003-0851-6629
  surname: Chiueh
  fullname: Chiueh, Tzi-Dar
  email: chiueh@ntu.edu.tw
  organization: Department of Electrical Engineering Engineering and the Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan
BookMark eNo9kEtPAjEUhRuDiYj8AjZNXIPtPNqpO4LgI6iJQFxOOtM7UByn2HYk_nuLGNo0p4vzndx7LlGnMQ0gNKBkRCkRN0_T5WS8GEWEilEkKBVCnKFuRFM2jGOWdk7_lF-gvnNbEk7KKEuSLmpntZF-cXeLx_gF9vgd9Hrj8RvsLDhovPTaNFg2Co-dM6WWHhRe7VRQ_Ax-YxSujMXTqtKlDn48Mc23qdsDJusQ2do_8XtjP_DSSt3oZn2FzitZO-j_aw-tZmGLh-H89f5xMp4Py0hwPywLXjBSFBCpNFMsZUpBGV5BieIsgZKLDCDLIiITJYngXBUJDwThiSy4iHvo-pi7s-arBefzrWltGMzlURQLxlKSxMEVH12lNc5ZqPKd1Z_S_uSU5IeK82PF-aHi_L_iQA2OlAaAE5ExEWfh_gLjmns5
CODEN IJESLY
CitedBy_id crossref_primary_10_1016_j_jestch_2022_101153
crossref_primary_10_4018_JOEUC_289222
crossref_primary_10_1080_08839514_2022_2137650
crossref_primary_10_1080_0952813X_2022_2092558
crossref_primary_10_1109_TNNLS_2021_3082304
crossref_primary_10_1109_TCSI_2020_2973537
crossref_primary_10_1089_neu_2020_7281
crossref_primary_10_1109_JETCAS_2021_3116044
Cites_doi 10.1162/neco.1997.9.5.1093
10.1109/SIPS.2000.886740
10.1109/MSP.2012.2205597
10.1145/2647868.2654889
10.1109/TCSII.2005.853895
10.1109/5.726791
10.1109/CVPR.2016.90
10.1109/ICASSP.2018.8461702
10.1038/nature16961
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/JETCAS.2019.2911999
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Xplore
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2156-3365
EndPage 279
ExternalDocumentID 10_1109_JETCAS_2019_2911999
8693838
Genre orig-research
GrantInformation_xml – fundername: Ministry of Science and Technology, Taiwan
  grantid: MOST 106-2221-E-002-238
  funderid: 10.13039/501100004663
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
AASAJ
ABQJQ
ABVLG
AENEX
AKJIK
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
HZ~
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
PQQKQ
RIA
RIE
RIG
RNS
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c297t-cb7b60bbe2d58d656ddecddeb10d764ec798ee8820a4da0977db4760b074ab793
IEDL.DBID RIE
ISSN 2156-3357
IngestDate Fri Sep 13 00:47:51 EDT 2024
Fri Aug 23 00:35:02 EDT 2024
Wed Jun 26 19:28:11 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c297t-cb7b60bbe2d58d656ddecddeb10d764ec798ee8820a4da0977db4760b074ab793
ORCID 0000-0003-0851-6629
0000-0002-7165-8112
0000-0001-7838-3150
0000-0001-7268-5158
PQID 2239665043
PQPubID 2040416
PageCount 13
ParticipantIDs crossref_primary_10_1109_JETCAS_2019_2911999
ieee_primary_8693838
proquest_journals_2239665043
PublicationCentury 2000
PublicationDate 2019-06-01
PublicationDateYYYYMMDD 2019-06-01
PublicationDate_xml – month: 06
  year: 2019
  text: 2019-06-01
  day: 01
PublicationDecade 2010
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE journal on emerging and selected topics in circuits and systems
PublicationTitleAbbrev JETCAS
PublicationYear 2019
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref34
rastegari (ref13) 2016
neelakantan (ref33) 2015
ref32
köster (ref16) 2017
li (ref18) 2016
han (ref7) 2015
ref2
wang (ref20) 2018
sakr (ref11) 2017
micikevicius (ref17) 2018
wu (ref19) 2018
srivastava (ref23) 2014; 15
(ref31) 2016
zhou (ref15) 2016
hubara (ref14) 2016
glorot (ref26) 2010
han (ref8) 2015
lin (ref9) 2016
ref24
silver (ref4) 2016; 529
denton (ref6) 2014
ref22
ref21
(ref30) 2019
ref27
(ref28) 2019
ref29
jung (ref10) 2018
gupta (ref25) 2015
krizhevsky (ref1) 2012
lin (ref5) 2016
courbariaux (ref12) 2015
bojarski (ref3) 2016
References_xml – ident: ref22
  doi: 10.1162/neco.1997.9.5.1093
– ident: ref21
  doi: 10.1109/SIPS.2000.886740
– year: 2015
  ident: ref25
  publication-title: Deep Learning with Limited Numerical Precision
  contributor:
    fullname: gupta
– year: 2015
  ident: ref33
  publication-title: Adding gradient noise improves learning for very deep networks
  contributor:
    fullname: neelakantan
– year: 2016
  ident: ref14
  publication-title: Quantized neural networks Training neural networks with low precision weights and activation
  contributor:
    fullname: hubara
– ident: ref2
  doi: 10.1109/MSP.2012.2205597
– start-page: 1742
  year: 2017
  ident: ref16
  article-title: Flexpoint: An adaptive numerical format for efficient training of deep neural networks
  publication-title: Proc Int Conf Neural Inf Process Syst (NIPS)
  contributor:
    fullname: köster
– year: 2016
  ident: ref3
  publication-title: End to End Learning for Self-Driving Cars
  contributor:
    fullname: bojarski
– ident: ref27
  doi: 10.1145/2647868.2654889
– year: 2016
  ident: ref31
  publication-title: Image
– year: 2016
  ident: ref5
  publication-title: Fixed point quantization of deep convolutional networks
  contributor:
    fullname: lin
– start-page: 1135
  year: 2015
  ident: ref7
  article-title: Learning both weights and connections for efficient neural network
  publication-title: Proc Int Conf Neural Inf Process Syst (NIPS)
  contributor:
    fullname: han
– year: 2018
  ident: ref10
  publication-title: Learning to quantize deep networks by optimizing quantization intervals with task loss
  contributor:
    fullname: jung
– year: 2015
  ident: ref12
  publication-title: BinaryConnect Training deep neural networks with binary weights during propagations
  contributor:
    fullname: courbariaux
– start-page: 1097
  year: 2012
  ident: ref1
  article-title: ImageNet classification with deep convolutional neural networks
  publication-title: Proc Adv Neural Inf Process Syst
  contributor:
    fullname: krizhevsky
– start-page: 3007
  year: 2017
  ident: ref11
  article-title: Analytical guarantees on numerical precision of deep neural networks
  publication-title: Proc 34th Int Conf Mach Learn
  contributor:
    fullname: sakr
– year: 2014
  ident: ref6
  publication-title: Exploiting linear structure within convolutional networks for efficient evaluation
  contributor:
    fullname: denton
– ident: ref24
  doi: 10.1109/TCSII.2005.853895
– start-page: 249
  year: 2010
  ident: ref26
  article-title: Understanding the difficulty of training deep feedforward neural networks
  publication-title: Proc 13th Int Conf Artif Intell Statist
  contributor:
    fullname: glorot
– year: 2019
  ident: ref28
  publication-title: The mnist
– year: 2016
  ident: ref13
  publication-title: Xnor-net Imagenet classification using binary convolutional neural networks
  contributor:
    fullname: rastegari
– year: 2015
  ident: ref8
  publication-title: Deep compression Compressing deep neural networks with pruning trained quantization and huffman coding
  contributor:
    fullname: han
– year: 2018
  ident: ref17
  publication-title: Mixed Precision Training
  contributor:
    fullname: micikevicius
– ident: ref29
  doi: 10.1109/5.726791
– year: 2019
  ident: ref30
  publication-title: Cifar-10
– year: 2016
  ident: ref15
  publication-title: DoReFa-Net Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
  contributor:
    fullname: zhou
– ident: ref32
  doi: 10.1109/CVPR.2016.90
– year: 2016
  ident: ref18
  publication-title: Ternary Weight Networks
  contributor:
    fullname: li
– volume: 15
  start-page: 1929
  year: 2014
  ident: ref23
  article-title: Dropout: A simple way to prevent neural networks from overfitting
  publication-title: J Mach Learn Res
  contributor:
    fullname: srivastava
– year: 2018
  ident: ref19
  publication-title: Training and inference with integers in deep neural networks
  contributor:
    fullname: wu
– start-page: 7685
  year: 2018
  ident: ref20
  article-title: Training deep neural networks with 8-bit floating point numbers
  publication-title: Proc Adv Neural Inf Process Syst
  contributor:
    fullname: wang
– ident: ref34
  doi: 10.1109/ICASSP.2018.8461702
– volume: 529
  start-page: 484
  year: 2016
  ident: ref4
  article-title: Mastering the game of Go with deep neural networks and tree search
  publication-title: Nature
  doi: 10.1038/nature16961
  contributor:
    fullname: silver
– year: 2016
  ident: ref9
  publication-title: Overcoming challenges in fixed point training of deep convolutional networks
  contributor:
    fullname: lin
SSID ssj0000561644
Score 2.2457116
Snippet In this paper, we propose a floating-point signed digit (FloatSD) format for convolutional neural network (CNN) weight representation and its update method...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 267
SubjectTerms Artificial neural networks
CIFAR-10
Complexity
Computational modeling
Computer memory
Convolution
Convolutional neural network (CNN)
Convolutional neural networks
Digits
Floating point arithmetic
ImageNet
low-complexity training
MNIST
Multiplication
Neural networks
Neurons
Quantization (signal)
Representations
Training
Weight
weight quantization
Title FloatSD: A New Weight Representation and Associated Update Method for Efficient Convolutional Neural Network Training
URI https://ieeexplore.ieee.org/document/8693838
https://www.proquest.com/docview/2239665043/abstract/
Volume 9
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VTjDwRhQK8sDYlDzcJGarClVVqQyUim5RbF8XUFJBysCv5-y4BQEDQ5QMdmL5nLvvzufvAK6CeBEFC7P1r9LY45j0PMHJS1E6jVATZNe23NvkPh7N-HjemzegszkLg4g2-Qy75tHu5etSrUyo7DqNBTlU6RZspX5Yn9XaxFMMEo5t7VYyYrEXRb3EkQwFvrge3z0O-lOTySW6If3fwnK9fhkiW1nllzq2Nma4B5P16OrUkufuqpJd9fGDuPG_w9-HXQc2Wb9eHQfQwOIQdr5REB7BavhS5tX09ob1GSk89mRDpezBJsi6c0kFywvN1oJEzWZLEydgE1t9mhHsZXeWiYLas0FZvLvlTJ823B_2ZpPN2aMrSHEMsyHN1chzpRg8FYqk8pRMZOxLiaHupZowIGlFRZcMfJ3EHFUiUkRC637Ode4TqNSSJ9SDEEouSQecQLMoCzwFJjiZZR0ITKMFj6QSPA80hhK12UNVogWdtVyyZc24kVlPxRdZLcbMiDFzYmzBkZnpTVM3yS1or2WZub_yLSMoRN6d4Ww7-7vXOWybd9epYG1oVq8rvCDQUclLu9o-AfrC1PA
link.rule.ids 315,786,790,802,27957,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED4VGICBN6I8PTA2JWncJGarClUptANtBVsU29cFlCJIGfj1nB23IGBgiJLBVhKffffd-fwdwHkQTcJgYrb-VRJ5HOOmJzh5KUonIWqC7NqWe-sPou6Y9x6bjxWoLc7CIKJNPsO6ebR7-XqqZiZUdpFEghyqZAlWyM77ojyttYioGCwc2eqtZMYiLwybsaMZorYXvetRuzU0uVyi3qAVLizb65cpsrVVfilka2U6m9Cff1-ZXPJUnxWyrj5-UDf-9we2YMPBTdYq58c2VDDfgfVvJIS7MOs8T7NieHXJWoxUHnuwwVJ2b1Nk3cmknGW5ZnNRombjFxMpYH1bf5oR8GXXlouC2rP2NH93E5pebdg_7M2mm7ORK0mxB-MOjVXXc8UYPNUQceEpGcvIlxIbuploQoGkFxVdMvB1HHFUsUgQCa_7GdeZT7BSSx5TD8IomSQtsA_L-TTHA2CCk2HWgcAknPBQKsGzQGNDoja7qEpUoTaXS_pScm6k1lfxRVqKMTViTJ0Yq7BrRnrR1A1yFY7nskzdunxLCQyRf2dY2w7_7nUGq91R_y69uxncHsGaeU-ZGHYMy8XrDE8IghTy1M68T14x2EY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=FloatSD%3A+A+New+Weight+Representation+and+Associated+Update+Method+for+Efficient+Convolutional+Neural+Network+Training&rft.jtitle=IEEE+journal+on+emerging+and+selected+topics+in+circuits+and+systems&rft.au=Lin%2C+Po-Chen&rft.au=Sun%2C+Mu-Kai&rft.au=Kung%2C+Chuking&rft.au=Chiueh%2C+Tzi-Dar&rft.date=2019-06-01&rft.issn=2156-3357&rft.eissn=2156-3365&rft.volume=9&rft.issue=2&rft.spage=267&rft.epage=279&rft_id=info:doi/10.1109%2FJETCAS.2019.2911999&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_JETCAS_2019_2911999
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2156-3357&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2156-3357&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2156-3357&client=summon