Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method

Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton metho...

Full description

Saved in:

Bibliographic Details
Main Authors	Thiele, Christopher, Araya-Polo, Mauricio, Hohl, Detlef
Format	Journal Article
Language	English
Published	06.04.2020
Subjects	Computer Science - Learning Statistics - Machine Learning
Online Access	Get full text

Cover

Loading…

Abstract	Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.
AbstractList	Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.
Author	Hohl, Detlef Thiele, Christopher Araya-Polo, Mauricio
Author_xml	– sequence: 1 givenname: Christopher surname: Thiele fullname: Thiele, Christopher – sequence: 2 givenname: Mauricio surname: Araya-Polo fullname: Araya-Polo, Mauricio – sequence: 3 givenname: Detlef surname: Hohl fullname: Hohl, Detlef
BackLink	https://doi.org/10.48550/arXiv.2004.03040$$DView paper in arXiv
BookMark	eNotkMFOwzAQRH2AAxQ-gBP-AYdN7DjhiAoUpNKA2nu0tTfEok0qxyG0X09oOY1GozfSzCU7a9qGGLuJIVJ5msId-h_3HSUAKgIJCi7Y4ZFoxxfUe9yMEobWf_E5oW9c88kHF2q-JNM2VhTekufFLritO5DvuBAc-btHE5wZ4WXo7f5E4GhaU2M3Jvyjx86JGfZdJxY0hLbhbxTq1l6x8wo3HV3_64Stnp9W0xcxL2av04e5QJ2ByKTOc9CVTVEnRq9tgmDlvV5nKq1UBanKQWoJZPM4yY1CmUpSibKxJCuNlBN2e6o9ji933m3R78u_E8rjCfIX0R9Y-w
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY EPD GOX
DOI	10.48550/arxiv.2004.03040
DatabaseName	arXiv Computer Science arXiv Statistics arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2004_03040
GroupedDBID	AKY EPD GOX
ID	FETCH-LOGICAL-a670-7368806fd5a62c6bd2a0d396b745f4f054803630ed8128c4a353e424d13ed3c33
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:50:07 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a670-7368806fd5a62c6bd2a0d396b745f4f054803630ed8128c4a353e424d13ed3c33
OpenAccessLink	https://arxiv.org/abs/2004.03040
ParticipantIDs	arxiv_primary_2004_03040
PublicationCentury	2000
PublicationDate	2020-04-06
PublicationDateYYYYMMDD	2020-04-06
PublicationDate_xml	– month: 04 year: 2020 text: 2020-04-06 day: 06
PublicationDecade	2020
PublicationYear	2020
Score	1.7671087
SecondaryResourceType	preprint
Snippet	Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Learning Statistics - Machine Learning
Title	Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method
URI	https://arxiv.org/abs/2004.03040
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3NT0IxDF-QkxejUYOf6cHrdO7rwdGoYEyAGDHhRra3PeUgEB4Y5a-3fXtGLx63dTu06dpu7a-MXcRCWxPw9rMY83DdjoI7rTwPPpiOUzoaQ_XO_YF9eNGPYzNuMPiphXHLz-lHwgf25RWJ8JI-7zAo35KSUrZ6w3H6nKyguGr6Xzr0MaupP0aiu8t2au8ObpI49lgjzvbZ5i7GBRAKBi4NUto11Limr0APofBMYWngQ8LBhCFq8ft0g34ZcA4OEqQQ8hIo6e8r7XA4mOdvjmCW4WntyinvuXVZcry10J2DftUZ-oCNuvej2wdetzzgzmaCZ8qiPtkiGGdlbn2QTgTVsT7TptCFIHA2ZZWIAe1yO9dOGRW11OFaxaBypQ5ZczafxRYD1EQthfV4ltdUYoq8z6WQUfgsFNIesVbFqMkioVpQP0o9qXh4_P_SCduWFHBS6oo9Zc3Vch3P0Cqv_Hklmm-gjYxM
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+Neural+Network+Learning+with+Second-Order+Optimizers+--+a+Practical+Study+with+a+Stochastic+Quasi-Gauss-Newton+Method&rft.au=Thiele%2C+Christopher&rft.au=Araya-Polo%2C+Mauricio&rft.au=Hohl%2C+Detlef&rft.date=2020-04-06&rft_id=info:doi/10.48550%2Farxiv.2004.03040&rft.externalDocID=2004_03040