Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method

Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton metho...

Full description

Saved in:
Bibliographic Details
Main Authors Thiele, Christopher, Araya-Polo, Mauricio, Hohl, Detlef
Format Journal Article
LanguageEnglish
Published 06.04.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.
AbstractList Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.
Author Hohl, Detlef
Thiele, Christopher
Araya-Polo, Mauricio
Author_xml – sequence: 1
  givenname: Christopher
  surname: Thiele
  fullname: Thiele, Christopher
– sequence: 2
  givenname: Mauricio
  surname: Araya-Polo
  fullname: Araya-Polo, Mauricio
– sequence: 3
  givenname: Detlef
  surname: Hohl
  fullname: Hohl, Detlef
BackLink https://doi.org/10.48550/arXiv.2004.03040$$DView paper in arXiv
BookMark eNotkMFOwzAQRH2AAxQ-gBP-AYdN7DjhiAoUpNKA2nu0tTfEok0qxyG0X09oOY1GozfSzCU7a9qGGLuJIVJ5msId-h_3HSUAKgIJCi7Y4ZFoxxfUe9yMEobWf_E5oW9c88kHF2q-JNM2VhTekufFLritO5DvuBAc-btHE5wZ4WXo7f5E4GhaU2M3Jvyjx86JGfZdJxY0hLbhbxTq1l6x8wo3HV3_64Stnp9W0xcxL2av04e5QJ2ByKTOc9CVTVEnRq9tgmDlvV5nKq1UBanKQWoJZPM4yY1CmUpSibKxJCuNlBN2e6o9ji933m3R78u_E8rjCfIX0R9Y-w
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
EPD
GOX
DOI 10.48550/arxiv.2004.03040
DatabaseName arXiv Computer Science
arXiv Statistics
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2004_03040
GroupedDBID AKY
EPD
GOX
ID FETCH-LOGICAL-a670-7368806fd5a62c6bd2a0d396b745f4f054803630ed8128c4a353e424d13ed3c33
IEDL.DBID GOX
IngestDate Mon Jan 08 05:50:07 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a670-7368806fd5a62c6bd2a0d396b745f4f054803630ed8128c4a353e424d13ed3c33
OpenAccessLink https://arxiv.org/abs/2004.03040
ParticipantIDs arxiv_primary_2004_03040
PublicationCentury 2000
PublicationDate 2020-04-06
PublicationDateYYYYMMDD 2020-04-06
PublicationDate_xml – month: 04
  year: 2020
  text: 2020-04-06
  day: 06
PublicationDecade 2020
PublicationYear 2020
Score 1.7671087
SecondaryResourceType preprint
Snippet Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Learning
Statistics - Machine Learning
Title Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method
URI https://arxiv.org/abs/2004.03040
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3NT0IxDF-QkxejUYOf6cHrdO7rwdGoYEyAGDHhRra3PeUgEB4Y5a-3fXtGLx63dTu06dpu7a-MXcRCWxPw9rMY83DdjoI7rTwPPpiOUzoaQ_XO_YF9eNGPYzNuMPiphXHLz-lHwgf25RWJ8JI-7zAo35KSUrZ6w3H6nKyguGr6Xzr0MaupP0aiu8t2au8ObpI49lgjzvbZ5i7GBRAKBi4NUto11Limr0APofBMYWngQ8LBhCFq8ft0g34ZcA4OEqQQ8hIo6e8r7XA4mOdvjmCW4WntyinvuXVZcry10J2DftUZ-oCNuvej2wdetzzgzmaCZ8qiPtkiGGdlbn2QTgTVsT7TptCFIHA2ZZWIAe1yO9dOGRW11OFaxaBypQ5ZczafxRYD1EQthfV4ltdUYoq8z6WQUfgsFNIesVbFqMkioVpQP0o9qXh4_P_SCduWFHBS6oo9Zc3Vch3P0Cqv_Hklmm-gjYxM
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+Neural+Network+Learning+with+Second-Order+Optimizers+--+a+Practical+Study+with+a+Stochastic+Quasi-Gauss-Newton+Method&rft.au=Thiele%2C+Christopher&rft.au=Araya-Polo%2C+Mauricio&rft.au=Hohl%2C+Detlef&rft.date=2020-04-06&rft_id=info:doi/10.48550%2Farxiv.2004.03040&rft.externalDocID=2004_03040