Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be impl...
Saved in:
Published in | 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 2704 - 2713 |
---|---|
Main Authors | , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2018
|
Subjects | |
Online Access | Get full text |
ISSN | 1063-6919 |
DOI | 10.1109/CVPR.2018.00286 |
Cover
Loading…
Abstract | The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs. |
---|---|
AbstractList | The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs. |
Author | Kligys, Skirmantas Adam, Hartwig Kalenichenko, Dmitry Zhu, Menglong Tang, Matthew Jacob, Benoit Howard, Andrew Chen, Bo |
Author_xml | – sequence: 1 givenname: Benoit surname: Jacob fullname: Jacob, Benoit – sequence: 2 givenname: Skirmantas surname: Kligys fullname: Kligys, Skirmantas – sequence: 3 givenname: Bo surname: Chen fullname: Chen, Bo – sequence: 4 givenname: Menglong surname: Zhu fullname: Zhu, Menglong – sequence: 5 givenname: Matthew surname: Tang fullname: Tang, Matthew – sequence: 6 givenname: Andrew surname: Howard fullname: Howard, Andrew – sequence: 7 givenname: Hartwig surname: Adam fullname: Adam, Hartwig – sequence: 8 givenname: Dmitry surname: Kalenichenko fullname: Kalenichenko, Dmitry |
BookMark | eNotj01LAzEURaMoWGvXLtzkD0zNm3zOspRaC8WqVFdCSWZearTNSCZF6q-3RVcHLpfDvZfkLLYRCbkGNgRg1e349fF5WDIwQ8ZKo07IoNIGJDdKiZJVp6QHTPFCVVBdkEHXfbBDTxluhOyRt6edjTn82BzaSG1s6DLZEENc09bTB9wluzkgf7fps6O-TXTifagDxkxnMeMaUzFKIb9vMYe6WMTN_pB7TBhrvCLn3m46HPyzT17uJsvxfTFfTGfj0bwIpYBcOG2dlMdJqCtwlmsQyolGgnfAGgDXGO1LKCtWGy4Yk7VlUnGNBp0UnPfJzZ83IOLqK4WtTfuVkfp4kv8COuZVMg |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/CVPR.2018.00286 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 9781538664209 1538664208 |
EISSN | 1063-6919 |
EndPage | 2713 |
ExternalDocumentID | 8578384 |
Genre | orig-research |
GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i241t-b7ab552683e791ba37146b4d51fb10d11bd87f21290c834005ca05637e8eb5433 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:52:16 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i241t-b7ab552683e791ba37146b4d51fb10d11bd87f21290c834005ca05637e8eb5433 |
PageCount | 10 |
ParticipantIDs | ieee_primary_8578384 |
PublicationCentury | 2000 |
PublicationDate | 2018-06 |
PublicationDateYYYYMMDD | 2018-06-01 |
PublicationDate_xml | – month: 06 year: 2018 text: 2018-06 |
PublicationDecade | 2010 |
PublicationTitle | 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
PublicationTitleAbbrev | CVPR |
PublicationYear | 2018 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0002683845 ssj0003211698 |
Score | 2.6272874 |
Snippet | The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 2704 |
SubjectTerms | Arrays Computational modeling Hardware Neural networks Quantization (signal) Training |
Title | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference |
URI | https://ieeexplore.ieee.org/document/8578384 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4AJ0-oYHxnDx5dsOyzR0MgxARFA4aDCdlXIxGLgfagv97ddq3RePDUdnvoZLabmZ39vm8AuLD-uEVgiQzlGhEsDVKaG0SkS45VzBOmPBt5fMtGM3Izp_MauKy4MNbaAnxmO_62OMs3a537UllXuN8LC1IHdbdxK7laVT2lx_wrWj1jt7NhsQhqPs6gbv9x8uCxXCV4kv1op1JEk2ETjL_sKEEkL508Ux398Uui8b-G7oL2N28PTqqItAdqNt0HzZBowrCMty3wdJ87hwYGJpSpgdPQKQKuE-j1OuTKXQqA-Ba6tBYOCqUJ91HoS4i-EHi9WWbPr54Cie7S1bsbDxa0wWw4mPZHKLRZQEsXvjOkuFTUq75gy-NISa_hxxQxNEpUdGWiSBnBk54vWGmB3ZqnWrq0CXMrrKIE4wPQSNepPQTQ5X6SxAYnhsaEWyaNFgIn0mrLKEnsEWh5Zy3eSiWNRfDT8d_DJ2DHT1cJzDoFjWyT2zOXAmTqvJj7TxtpsQ0 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_BowdmfOxoCQQVEA4aDCWnXLhJxGNgO-tfbbhWj8eBpW3fYy2ub9-31fd8D4My44xaBJdKUR4hgqZGKuEZEWnCsQh4z5djIvT7rjMjNmI5L4HzFhTHG5MVnpu5u87N8PY8ylyprCLu8sCBrYN3GfRIWbK1VRuWSuZd09Yztvw0LhdfzsSY1mo-DB1fNVZRPsh8NVfJ40q6A3pclRRnJSz1LVT36-CXS-F9Tt0Dtm7kHB6uYtA1KJtkBFQ81od_Iyyp4us-sSz0HE8pEw6HvFQHnMXSKHXJmL3mJ-BJaYAtbudaE_Sh0SUSXCrxaTNPnV0eCRHfJ7N2OewtqYNRuDZsd5BstoKkN4ClSXCrqdF-w4WGgpFPxY4poGsQquNBBoLTg8aVLWUUC211PI2mBE-ZGGEUJxrugnMwTswegRX-ShBrHmoaEGyZ1JASOpYkMoyQ2-6DqnDV5K7Q0Jt5PB38Pn4KNzrDXnXSv-7eHYNNNXVGmdQTK6SIzxxYQpOokXweflYC0XQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=Quantization+and+Training+of+Neural+Networks+for+Efficient+Integer-Arithmetic-Only+Inference&rft.au=Jacob%2C+Benoit&rft.au=Kligys%2C+Skirmantas&rft.au=Chen%2C+Bo&rft.au=Zhu%2C+Menglong&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=2704&rft.epage=2713&rft_id=info:doi/10.1109%2FCVPR.2018.00286&rft.externalDocID=8578384 |