Rethinking the Inception Architecture for Computer Vision

Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend...

Full description

Saved in:
Bibliographic Details
Published in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2818 - 2826
Main Authors Szegedy, Christian, Vanhoucke, Vincent, Ioffe, Sergey, Shlens, Jon, Wojna, Zbigniew
Format Conference Proceeding
LanguageEnglish
Published IEEE 09.12.2016
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.
AbstractList Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set.
Author Ioffe, Sergey
Wojna, Zbigniew
Vanhoucke, Vincent
Szegedy, Christian
Shlens, Jon
Author_xml – sequence: 1
  givenname: Christian
  surname: Szegedy
  fullname: Szegedy, Christian
  email: szegedy@google.com
– sequence: 2
  givenname: Vincent
  surname: Vanhoucke
  fullname: Vanhoucke, Vincent
  email: vanhoucke@google.com
– sequence: 3
  givenname: Sergey
  surname: Ioffe
  fullname: Ioffe, Sergey
  email: sioffe@google.com
– sequence: 4
  givenname: Jon
  surname: Shlens
  fullname: Shlens, Jon
  email: shlens@google.com
– sequence: 5
  givenname: Zbigniew
  surname: Wojna
  fullname: Wojna, Zbigniew
  email: zbigniewwojna@gmail.com
BookMark eNotjrtOwzAUQA0CiVIyMrH4BxJ87fg1VhGPSpVAFXStHPeaGKhTOe7A31MJprMcHZ1rcpHGhITcAmsAmL3vNq_rhjNQjWDmjFRWG2iVFsZIgHMyA6ZErSzYK1JN0ydjDKwyYOyM2DWWIaavmD5oGZAuk8dDiWOii-yHWNCXY0Yaxky7cX84Fsx0E6eTcEMug_uesPrnnLw_Prx1z_Xq5WnZLVZ15C2UuhdM9J610ve-7TXn1oOXKrg2AMrALXKNXDru0Qe-EzuHTCI6ZXplT_9iTu7-uhERt4cc9y7_bLU2TGktfgGOXkmp
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2016.308
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISBN 9781467388511
1467388513
EISSN 1063-6919
EndPage 2826
ExternalDocumentID 7780677
Genre orig-research
GroupedDBID 23M
29F
29O
6IE
6IH
6IK
ABDPE
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CBEJK
IPLJI
M43
RIE
RIO
RNS
ID FETCH-LOGICAL-i241t-b303bc045cbc4b7229c1c56fa4f1e5f29e27e25a2cecf2d3dae05eea68b698513
IEDL.DBID RIE
IngestDate Wed Aug 27 01:54:52 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-b303bc045cbc4b7229c1c56fa4f1e5f29e27e25a2cecf2d3dae05eea68b698513
PageCount 9
ParticipantIDs ieee_primary_7780677
PublicationCentury 2000
PublicationDate 2016-12-09
PublicationDateYYYYMMDD 2016-12-09
PublicationDate_xml – month: 12
  year: 2016
  text: 2016-12-09
  day: 09
PublicationDecade 2010
PublicationTitle 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev CVPR
PublicationYear 2016
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001968189
ssj0023720
ssj0003211698
Score 2.6073081
Snippet Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional...
SourceID ieee
SourceType Publisher
StartPage 2818
SubjectTerms Benchmark testing
Computational efficiency
Computational modeling
Computer architecture
Computer vision
Convolution
Training
Title Rethinking the Inception Architecture for Computer Vision
URI https://ieeexplore.ieee.org/document/7780677
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5tT56qtuKbHDy6225euzlKsahQKcWW3koeExRhK7q9-OtNdrfbIh68JcMuhAnhm2S-mQ-hG-EsTY2zkSPaRMxlNtJDpyMPjTqjglpaSrJMnsXDnD0t-bKFbptaGAAoyWcQh2GZy7drswlPZYM0zULDszZq-4tbVau1e0-RwmOPbObU32yEbDIKJKix7HpsDkaL6SwQu0RMg7LknrJKCSzjLppsl1TxSd7jTaFj8_2rW-N_13yI-rsSPjxtwOkItSA_Rt065sT1if7ypq2sw9bWQ3IGxWslqYB9eIgf85r6gu_2sg7YR7u4-XlRlqj30Xx8_zJ6iGqFhejNI3cRaQ9g2viozmjDdEqINInhwinmEuCOSCApEK6IAeOIpVbBkAMokWnvW57QE9TJ1zmcIsxVAsrZxDFDWSaF8qG7GSpLFAPmvz9DveCc1UfVRGNV--X8b_MFOgibU_JG5CXqFJ8buPLoX-jrctt_AHUsrno
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8JAEN0gHvSECsZv9-DRFrrb3bZHQzSgQAgBwo3sx2w0JsVoufjr3W1LIcaDt3bSJptpm_e682YeQnfcaBopoz1DpPJCE2tPdoz0LDTKmHKqaW7JMhzx3ix8XrBFDd1XvTAAkIvPwHeHeS1fr9TabZW1oyh2A8_20L7FfRYU3VrbHZWEW_RJqnNq_214UtUUiPNj2U7ZbHfn44mTdnGfOm_JHW-VHFqeGmi4WVShKHn315n01feveY3_XfURam2b-PC4gqdjVIP0BDVK1onLb_rLhjbGDptYEyUTyF4LUwVsCSLup6X4BT_s1B2w5bu4unmeN6m30OzpcdrteaXHgvdmsTvzpIUwqSyvU1KFMiIkUYFi3IjQBMAMSYBEQJggCpQhmmoBHQYgeCxtbllAT1E9XaVwhjATAQijAxMqGsYJF5a8q47QRIQQ2uvPUdMlZ_lRjNFYlnm5-Dt8iw560-FgOeiPXi7RoXtQuYokuUL17HMN15YLZPImfwV-AMBuscM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Rethinking+the+Inception+Architecture+for+Computer+Vision&rft.au=Szegedy%2C+Christian&rft.au=Vanhoucke%2C+Vincent&rft.au=Ioffe%2C+Sergey&rft.au=Shlens%2C+Jon&rft.date=2016-12-09&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=2818&rft.epage=2826&rft_id=info:doi/10.1109%2FCVPR.2016.308&rft.externalDocID=7780677