Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method
Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton metho...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
06.04.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Training in supervised deep learning is computationally demanding, and the
convergence behavior is usually not fully understood. We introduce and study a
second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that
combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and
variance reduction to address this problem. SQGN provides excellent accuracy
without the need for experimenting with many hyper-parameter configurations,
which is often computationally prohibitive given the number of combinations and
the cost of each training process. We discuss the implementation of SQGN with
TensorFlow, and we compare its convergence and computational performance to
selected first-order methods using the MNIST benchmark and a large-scale
seismic tomography application from Earth science. |
---|---|
AbstractList | Training in supervised deep learning is computationally demanding, and the
convergence behavior is usually not fully understood. We introduce and study a
second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that
combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and
variance reduction to address this problem. SQGN provides excellent accuracy
without the need for experimenting with many hyper-parameter configurations,
which is often computationally prohibitive given the number of combinations and
the cost of each training process. We discuss the implementation of SQGN with
TensorFlow, and we compare its convergence and computational performance to
selected first-order methods using the MNIST benchmark and a large-scale
seismic tomography application from Earth science. |
Author | Hohl, Detlef Thiele, Christopher Araya-Polo, Mauricio |
Author_xml | – sequence: 1 givenname: Christopher surname: Thiele fullname: Thiele, Christopher – sequence: 2 givenname: Mauricio surname: Araya-Polo fullname: Araya-Polo, Mauricio – sequence: 3 givenname: Detlef surname: Hohl fullname: Hohl, Detlef |
BackLink | https://doi.org/10.48550/arXiv.2004.03040$$DView paper in arXiv |
BookMark | eNotkMFOwzAQRH2AAxQ-gBP-AYdN7DjhiAoUpNKA2nu0tTfEok0qxyG0X09oOY1GozfSzCU7a9qGGLuJIVJ5msId-h_3HSUAKgIJCi7Y4ZFoxxfUe9yMEobWf_E5oW9c88kHF2q-JNM2VhTekufFLritO5DvuBAc-btHE5wZ4WXo7f5E4GhaU2M3Jvyjx86JGfZdJxY0hLbhbxTq1l6x8wo3HV3_64Stnp9W0xcxL2av04e5QJ2ByKTOc9CVTVEnRq9tgmDlvV5nKq1UBanKQWoJZPM4yY1CmUpSibKxJCuNlBN2e6o9ji933m3R78u_E8rjCfIX0R9Y-w |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY EPD GOX |
DOI | 10.48550/arxiv.2004.03040 |
DatabaseName | arXiv Computer Science arXiv Statistics arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2004_03040 |
GroupedDBID | AKY EPD GOX |
ID | FETCH-LOGICAL-a670-7368806fd5a62c6bd2a0d396b745f4f054803630ed8128c4a353e424d13ed3c33 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:50:07 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a670-7368806fd5a62c6bd2a0d396b745f4f054803630ed8128c4a353e424d13ed3c33 |
OpenAccessLink | https://arxiv.org/abs/2004.03040 |
ParticipantIDs | arxiv_primary_2004_03040 |
PublicationCentury | 2000 |
PublicationDate | 2020-04-06 |
PublicationDateYYYYMMDD | 2020-04-06 |
PublicationDate_xml | – month: 04 year: 2020 text: 2020-04-06 day: 06 |
PublicationDecade | 2020 |
PublicationYear | 2020 |
Score | 1.7671087 |
SecondaryResourceType | preprint |
Snippet | Training in supervised deep learning is computationally demanding, and the
convergence behavior is usually not fully understood. We introduce and study a... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Learning Statistics - Machine Learning |
Title | Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method |
URI | https://arxiv.org/abs/2004.03040 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3NT0IxDF-QkxejUYOf6cHrdO7rwdGoYEyAGDHhRra3PeUgEB4Y5a-3fXtGLx63dTu06dpu7a-MXcRCWxPw9rMY83DdjoI7rTwPPpiOUzoaQ_XO_YF9eNGPYzNuMPiphXHLz-lHwgf25RWJ8JI-7zAo35KSUrZ6w3H6nKyguGr6Xzr0MaupP0aiu8t2au8ObpI49lgjzvbZ5i7GBRAKBi4NUto11Limr0APofBMYWngQ8LBhCFq8ft0g34ZcA4OEqQQ8hIo6e8r7XA4mOdvjmCW4WntyinvuXVZcry10J2DftUZ-oCNuvej2wdetzzgzmaCZ8qiPtkiGGdlbn2QTgTVsT7TptCFIHA2ZZWIAe1yO9dOGRW11OFaxaBypQ5ZczafxRYD1EQthfV4ltdUYoq8z6WQUfgsFNIesVbFqMkioVpQP0o9qXh4_P_SCduWFHBS6oo9Zc3Vch3P0Cqv_Hklmm-gjYxM |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+Neural+Network+Learning+with+Second-Order+Optimizers+--+a+Practical+Study+with+a+Stochastic+Quasi-Gauss-Newton+Method&rft.au=Thiele%2C+Christopher&rft.au=Araya-Polo%2C+Mauricio&rft.au=Hohl%2C+Detlef&rft.date=2020-04-06&rft_id=info:doi/10.48550%2Farxiv.2004.03040&rft.externalDocID=2004_03040 |