Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be impl...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 2704 - 2713
Main Authors	Jacob, Benoit, Kligys, Skirmantas, Chen, Bo, Zhu, Menglong, Tang, Matthew, Howard, Andrew, Adam, Hartwig, Kalenichenko, Dmitry
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2018
Subjects	Arrays Computational modeling Hardware Neural networks Quantization (signal) Training
Online Access	Get full text
ISSN	1063-6919
DOI	10.1109/CVPR.2018.00286

Cover

Loading…

Abstract	The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
AbstractList	The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.
Author	Kligys, Skirmantas Adam, Hartwig Kalenichenko, Dmitry Zhu, Menglong Tang, Matthew Jacob, Benoit Howard, Andrew Chen, Bo
Author_xml	– sequence: 1 givenname: Benoit surname: Jacob fullname: Jacob, Benoit – sequence: 2 givenname: Skirmantas surname: Kligys fullname: Kligys, Skirmantas – sequence: 3 givenname: Bo surname: Chen fullname: Chen, Bo – sequence: 4 givenname: Menglong surname: Zhu fullname: Zhu, Menglong – sequence: 5 givenname: Matthew surname: Tang fullname: Tang, Matthew – sequence: 6 givenname: Andrew surname: Howard fullname: Howard, Andrew – sequence: 7 givenname: Hartwig surname: Adam fullname: Adam, Hartwig – sequence: 8 givenname: Dmitry surname: Kalenichenko fullname: Kalenichenko, Dmitry
BookMark	eNotj01LAzEURaMoWGvXLtzkD0zNm3zOspRaC8WqVFdCSWZearTNSCZF6q-3RVcHLpfDvZfkLLYRCbkGNgRg1e349fF5WDIwQ8ZKo07IoNIGJDdKiZJVp6QHTPFCVVBdkEHXfbBDTxluhOyRt6edjTn82BzaSG1s6DLZEENc09bTB9wluzkgf7fps6O-TXTifagDxkxnMeMaUzFKIb9vMYe6WMTN_pB7TBhrvCLn3m46HPyzT17uJsvxfTFfTGfj0bwIpYBcOG2dlMdJqCtwlmsQyolGgnfAGgDXGO1LKCtWGy4Yk7VlUnGNBp0UnPfJzZ83IOLqK4WtTfuVkfp4kv8COuZVMg
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR.2018.00286
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781538664209 1538664208
EISSN	1063-6919
EndPage	2713
ExternalDocumentID	8578384
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i241t-b7ab552683e791ba37146b4d51fb10d11bd87f21290c834005ca05637e8eb5433
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:52:16 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i241t-b7ab552683e791ba37146b4d51fb10d11bd87f21290c834005ca05637e8eb5433
PageCount	10
ParticipantIDs	ieee_primary_8578384
PublicationCentury	2000
PublicationDate	2018-06
PublicationDateYYYYMMDD	2018-06-01
PublicationDate_xml	– month: 06 year: 2018 text: 2018-06
PublicationDecade	2010
PublicationTitle	2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev	CVPR
PublicationYear	2018
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0002683845 ssj0003211698
Score	2.6272874
Snippet	The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device...
SourceID	ieee
SourceType	Publisher
StartPage	2704
SubjectTerms	Arrays Computational modeling Hardware Neural networks Quantization (signal) Training
Title	Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
URI	https://ieeexplore.ieee.org/document/8578384
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4AJ0-oYHxnDx5dsOyzR0MgxARFA4aDCdlXIxGLgfagv97ddq3RePDUdnvoZLabmZ39vm8AuLD-uEVgiQzlGhEsDVKaG0SkS45VzBOmPBt5fMtGM3Izp_MauKy4MNbaAnxmO_62OMs3a537UllXuN8LC1IHdbdxK7laVT2lx_wrWj1jt7NhsQhqPs6gbv9x8uCxXCV4kv1op1JEk2ETjL_sKEEkL508Ux398Uui8b-G7oL2N28PTqqItAdqNt0HzZBowrCMty3wdJ87hwYGJpSpgdPQKQKuE-j1OuTKXQqA-Ba6tBYOCqUJ91HoS4i-EHi9WWbPr54Cie7S1bsbDxa0wWw4mPZHKLRZQEsXvjOkuFTUq75gy-NISa_hxxQxNEpUdGWiSBnBk54vWGmB3ZqnWrq0CXMrrKIE4wPQSNepPQTQ5X6SxAYnhsaEWyaNFgIn0mrLKEnsEWh5Zy3eSiWNRfDT8d_DJ2DHT1cJzDoFjWyT2zOXAmTqvJj7TxtpsQ0
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_BowdmfOxoCQQVEA4aDCWnXLhJxGNgO-tfbbhWj8eBpW3fYy2ub9-31fd8D4My44xaBJdKUR4hgqZGKuEZEWnCsQh4z5djIvT7rjMjNmI5L4HzFhTHG5MVnpu5u87N8PY8ylyprCLu8sCBrYN3GfRIWbK1VRuWSuZd09Yztvw0LhdfzsSY1mo-DB1fNVZRPsh8NVfJ40q6A3pclRRnJSz1LVT36-CXS-F9Tt0Dtm7kHB6uYtA1KJtkBFQ81od_Iyyp4us-sSz0HE8pEw6HvFQHnMXSKHXJmL3mJ-BJaYAtbudaE_Sh0SUSXCrxaTNPnV0eCRHfJ7N2OewtqYNRuDZsd5BstoKkN4ClSXCrqdF-w4WGgpFPxY4poGsQquNBBoLTg8aVLWUUC211PI2mBE-ZGGEUJxrugnMwTswegRX-ShBrHmoaEGyZ1JASOpYkMoyQ2-6DqnDV5K7Q0Jt5PB38Pn4KNzrDXnXSv-7eHYNNNXVGmdQTK6SIzxxYQpOokXweflYC0XQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=Quantization+and+Training+of+Neural+Networks+for+Efficient+Integer-Arithmetic-Only+Inference&rft.au=Jacob%2C+Benoit&rft.au=Kligys%2C+Skirmantas&rft.au=Chen%2C+Bo&rft.au=Zhu%2C+Menglong&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=2704&rft.epage=2713&rft_id=info:doi/10.1109%2FCVPR.2018.00286&rft.externalDocID=8578384