Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be impl...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 2704 - 2713
Main Authors Jacob, Benoit, Kligys, Skirmantas, Chen, Bo, Zhu, Menglong, Tang, Matthew, Howard, Andrew, Adam, Hartwig, Kalenichenko, Dmitry
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2018
Subjects
Online AccessGet full text
ISSN1063-6919
DOI10.1109/CVPR.2018.00286

Cover

Loading…
Abstract The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
AbstractList The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
Author Kligys, Skirmantas
Adam, Hartwig
Kalenichenko, Dmitry
Zhu, Menglong
Tang, Matthew
Jacob, Benoit
Howard, Andrew
Chen, Bo
Author_xml – sequence: 1
  givenname: Benoit
  surname: Jacob
  fullname: Jacob, Benoit
– sequence: 2
  givenname: Skirmantas
  surname: Kligys
  fullname: Kligys, Skirmantas
– sequence: 3
  givenname: Bo
  surname: Chen
  fullname: Chen, Bo
– sequence: 4
  givenname: Menglong
  surname: Zhu
  fullname: Zhu, Menglong
– sequence: 5
  givenname: Matthew
  surname: Tang
  fullname: Tang, Matthew
– sequence: 6
  givenname: Andrew
  surname: Howard
  fullname: Howard, Andrew
– sequence: 7
  givenname: Hartwig
  surname: Adam
  fullname: Adam, Hartwig
– sequence: 8
  givenname: Dmitry
  surname: Kalenichenko
  fullname: Kalenichenko, Dmitry
BookMark eNotj01LAzEURaMoWGvXLtzkD0zNm3zOspRaC8WqVFdCSWZearTNSCZF6q-3RVcHLpfDvZfkLLYRCbkGNgRg1e349fF5WDIwQ8ZKo07IoNIGJDdKiZJVp6QHTPFCVVBdkEHXfbBDTxluhOyRt6edjTn82BzaSG1s6DLZEENc09bTB9wluzkgf7fps6O-TXTifagDxkxnMeMaUzFKIb9vMYe6WMTN_pB7TBhrvCLn3m46HPyzT17uJsvxfTFfTGfj0bwIpYBcOG2dlMdJqCtwlmsQyolGgnfAGgDXGO1LKCtWGy4Yk7VlUnGNBp0UnPfJzZ83IOLqK4WtTfuVkfp4kv8COuZVMg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2018.00286
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781538664209
1538664208
EISSN 1063-6919
EndPage 2713
ExternalDocumentID 8578384
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i241t-b7ab552683e791ba37146b4d51fb10d11bd87f21290c834005ca05637e8eb5433
IEDL.DBID RIE
IngestDate Wed Aug 27 02:52:16 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-b7ab552683e791ba37146b4d51fb10d11bd87f21290c834005ca05637e8eb5433
PageCount 10
ParticipantIDs ieee_primary_8578384
PublicationCentury 2000
PublicationDate 2018-06
PublicationDateYYYYMMDD 2018-06-01
PublicationDate_xml – month: 06
  year: 2018
  text: 2018-06
PublicationDecade 2010
PublicationTitle 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev CVPR
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002683845
ssj0003211698
Score 2.6272874
Snippet The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device...
SourceID ieee
SourceType Publisher
StartPage 2704
SubjectTerms Arrays
Computational modeling
Hardware
Neural networks
Quantization (signal)
Training
Title Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
URI https://ieeexplore.ieee.org/document/8578384
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4AJ0-oYHxnDx5dsOyzR0MgxARFA4aDCdlXIxGLgfagv97ddq3RePDUdnvoZLabmZ39vm8AuLD-uEVgiQzlGhEsDVKaG0SkS45VzBOmPBt5fMtGM3Izp_MauKy4MNbaAnxmO_62OMs3a537UllXuN8LC1IHdbdxK7laVT2lx_wrWj1jt7NhsQhqPs6gbv9x8uCxXCV4kv1op1JEk2ETjL_sKEEkL508Ux398Uui8b-G7oL2N28PTqqItAdqNt0HzZBowrCMty3wdJ87hwYGJpSpgdPQKQKuE-j1OuTKXQqA-Ba6tBYOCqUJ91HoS4i-EHi9WWbPr54Cie7S1bsbDxa0wWw4mPZHKLRZQEsXvjOkuFTUq75gy-NISa_hxxQxNEpUdGWiSBnBk54vWGmB3ZqnWrq0CXMrrKIE4wPQSNepPQTQ5X6SxAYnhsaEWyaNFgIn0mrLKEnsEWh5Zy3eSiWNRfDT8d_DJ2DHT1cJzDoFjWyT2zOXAmTqvJj7TxtpsQ0
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_BowdmfOxoCQQVEA4aDCWnXLhJxGNgO-tfbbhWj8eBpW3fYy2ub9-31fd8D4My44xaBJdKUR4hgqZGKuEZEWnCsQh4z5djIvT7rjMjNmI5L4HzFhTHG5MVnpu5u87N8PY8ylyprCLu8sCBrYN3GfRIWbK1VRuWSuZd09Yztvw0LhdfzsSY1mo-DB1fNVZRPsh8NVfJ40q6A3pclRRnJSz1LVT36-CXS-F9Tt0Dtm7kHB6uYtA1KJtkBFQ81od_Iyyp4us-sSz0HE8pEw6HvFQHnMXSKHXJmL3mJ-BJaYAtbudaE_Sh0SUSXCrxaTNPnV0eCRHfJ7N2OewtqYNRuDZsd5BstoKkN4ClSXCrqdF-w4WGgpFPxY4poGsQquNBBoLTg8aVLWUUC211PI2mBE-ZGGEUJxrugnMwTswegRX-ShBrHmoaEGyZ1JASOpYkMoyQ2-6DqnDV5K7Q0Jt5PB38Pn4KNzrDXnXSv-7eHYNNNXVGmdQTK6SIzxxYQpOokXweflYC0XQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=Quantization+and+Training+of+Neural+Networks+for+Efficient+Integer-Arithmetic-Only+Inference&rft.au=Jacob%2C+Benoit&rft.au=Kligys%2C+Skirmantas&rft.au=Chen%2C+Bo&rft.au=Zhu%2C+Menglong&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=2704&rft.epage=2713&rft_id=info:doi/10.1109%2FCVPR.2018.00286&rft.externalDocID=8578384