Rethinking the Inception Architecture for Computer Vision

Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend...

Full description

Saved in:

Bibliographic Details
Published in	2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2818 - 2826
Main Authors	Szegedy, Christian, Vanhoucke, Vincent, Ioffe, Sergey, Shlens, Jon, Wojna, Zbigniew
Format	Conference Proceeding
Language	English
Published	IEEE 09.12.2016
Subjects	Benchmark testing Computational efficiency Computational modeling Computer architecture Computer vision Convolution Training
Online Access	Get full text

Cover

Loading…

Abstract	Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.
AbstractList	Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.
Author	Ioffe, Sergey Wojna, Zbigniew Vanhoucke, Vincent Szegedy, Christian Shlens, Jon
Author_xml	– sequence: 1 givenname: Christian surname: Szegedy fullname: Szegedy, Christian email: szegedy@google.com – sequence: 2 givenname: Vincent surname: Vanhoucke fullname: Vanhoucke, Vincent email: vanhoucke@google.com – sequence: 3 givenname: Sergey surname: Ioffe fullname: Ioffe, Sergey email: sioffe@google.com – sequence: 4 givenname: Jon surname: Shlens fullname: Shlens, Jon email: shlens@google.com – sequence: 5 givenname: Zbigniew surname: Wojna fullname: Wojna, Zbigniew email: zbigniewwojna@gmail.com
BookMark	eNotjrtOwzAUQA0CiVIyMrH4BxJ87fg1VhGPSpVAFXStHPeaGKhTOe7A31MJprMcHZ1rcpHGhITcAmsAmL3vNq_rhjNQjWDmjFRWG2iVFsZIgHMyA6ZErSzYK1JN0ydjDKwyYOyM2DWWIaavmD5oGZAuk8dDiWOii-yHWNCXY0Yaxky7cX84Fsx0E6eTcEMug_uesPrnnLw_Prx1z_Xq5WnZLVZ15C2UuhdM9J610ve-7TXn1oOXKrg2AMrALXKNXDru0Qe-EzuHTCI6ZXplT_9iTu7-uhERt4cc9y7_bLU2TGktfgGOXkmp
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR.2016.308
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Computer Science
EISBN	9781467388511 1467388513
EISSN	1063-6919
EndPage	2826
ExternalDocumentID	7780677
Genre	orig-research
GroupedDBID	23M 29F 29O 6IE 6IH 6IK ABDPE ACGFS ALMA_UNASSIGNED_HOLDINGS CBEJK IPLJI M43 RIE RIO RNS
ID	FETCH-LOGICAL-i241t-b303bc045cbc4b7229c1c56fa4f1e5f29e27e25a2cecf2d3dae05eea68b698513
IEDL.DBID	RIE
IngestDate	Wed Aug 27 01:54:52 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i241t-b303bc045cbc4b7229c1c56fa4f1e5f29e27e25a2cecf2d3dae05eea68b698513
PageCount	9
ParticipantIDs	ieee_primary_7780677
PublicationCentury	2000
PublicationDate	2016-12-09
PublicationDateYYYYMMDD	2016-12-09
PublicationDate_xml	– month: 12 year: 2016 text: 2016-12-09 day: 09
PublicationDecade	2010
PublicationTitle	2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev	CVPR
PublicationYear	2016
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0001968189 ssj0023720 ssj0003211698
Score	2.6073081
Snippet	Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional...
SourceID	ieee
SourceType	Publisher
StartPage	2818
SubjectTerms	Benchmark testing Computational efficiency Computational modeling Computer architecture Computer vision Convolution Training
Title	Rethinking the Inception Architecture for Computer Vision
URI	https://ieeexplore.ieee.org/document/7780677
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5tT56qtuKbHDy6225euzlKsahQKcWW3koeExRhK7q9-OtNdrfbIh68JcMuhAnhm2S-mQ-hG-EsTY2zkSPaRMxlNtJDpyMPjTqjglpaSrJMnsXDnD0t-bKFbptaGAAoyWcQh2GZy7drswlPZYM0zULDszZq-4tbVau1e0-RwmOPbObU32yEbDIKJKix7HpsDkaL6SwQu0RMg7LknrJKCSzjLppsl1TxSd7jTaFj8_2rW-N_13yI-rsSPjxtwOkItSA_Rt065sT1if7ypq2sw9bWQ3IGxWslqYB9eIgf85r6gu_2sg7YR7u4-XlRlqj30Xx8_zJ6iGqFhejNI3cRaQ9g2viozmjDdEqINInhwinmEuCOSCApEK6IAeOIpVbBkAMokWnvW57QE9TJ1zmcIsxVAsrZxDFDWSaF8qG7GSpLFAPmvz9DveCc1UfVRGNV--X8b_MFOgibU_JG5CXqFJ8buPLoX-jrctt_AHUsrno
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8JAEN0gHvSECsZv9-DRFrrb3bZHQzSgQAgBwo3sx2w0JsVoufjr3W1LIcaDt3bSJptpm_e682YeQnfcaBopoz1DpPJCE2tPdoz0LDTKmHKqaW7JMhzx3ix8XrBFDd1XvTAAkIvPwHeHeS1fr9TabZW1oyh2A8_20L7FfRYU3VrbHZWEW_RJqnNq_214UtUUiPNj2U7ZbHfn44mTdnGfOm_JHW-VHFqeGmi4WVShKHn315n01feveY3_XfURam2b-PC4gqdjVIP0BDVK1onLb_rLhjbGDptYEyUTyF4LUwVsCSLup6X4BT_s1B2w5bu4unmeN6m30OzpcdrteaXHgvdmsTvzpIUwqSyvU1KFMiIkUYFi3IjQBMAMSYBEQJggCpQhmmoBHQYgeCxtbllAT1E9XaVwhjATAQijAxMqGsYJF5a8q47QRIQQ2uvPUdMlZ_lRjNFYlnm5-Dt8iw560-FgOeiPXi7RoXtQuYokuUL17HMN15YLZPImfwV-AMBuscM
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Rethinking+the+Inception+Architecture+for+Computer+Vision&rft.au=Szegedy%2C+Christian&rft.au=Vanhoucke%2C+Vincent&rft.au=Ioffe%2C+Sergey&rft.au=Shlens%2C+Jon&rft.date=2016-12-09&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=2818&rft.epage=2826&rft_id=info:doi/10.1109%2FCVPR.2016.308&rft.externalDocID=7780677