Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions

Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks a...

Full description

Saved in:
Bibliographic Details
Main Authors Feinberg, Vladimir, Chen, Xinyi, Sun, Y. Jennifer, Anil, Rohan, Hazan, Elad
Format Journal Article
LanguageEnglish
Published 07.02.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace that changes throughout training, motivating a low-rank sketching approach. We describe a generic method for reducing memory and compute requirements of maintaining a matrix preconditioner using the Frequent Directions (FD) sketch. While previous approaches have explored applying FD for second-order optimization, we present a novel analysis which allows efficient interpolation between resource requirements and the degradation in regret guarantees with rank $k$: in the online convex optimization (OCO) setting over dimension $d$, we match full-matrix $d^2$ memory regret using only $dk$ memory up to additive error in the bottom $d-k$ eigenvalues of the gradient covariance. Further, we show extensions of our work to Shampoo, resulting in a method competitive in quality with Shampoo and Adam, yet requiring only sub-linear memory for tracking second moments.
AbstractList Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace that changes throughout training, motivating a low-rank sketching approach. We describe a generic method for reducing memory and compute requirements of maintaining a matrix preconditioner using the Frequent Directions (FD) sketch. While previous approaches have explored applying FD for second-order optimization, we present a novel analysis which allows efficient interpolation between resource requirements and the degradation in regret guarantees with rank $k$: in the online convex optimization (OCO) setting over dimension $d$, we match full-matrix $d^2$ memory regret using only $dk$ memory up to additive error in the bottom $d-k$ eigenvalues of the gradient covariance. Further, we show extensions of our work to Shampoo, resulting in a method competitive in quality with Shampoo and Adam, yet requiring only sub-linear memory for tracking second moments.
Author Anil, Rohan
Chen, Xinyi
Sun, Y. Jennifer
Feinberg, Vladimir
Hazan, Elad
Author_xml – sequence: 1
  givenname: Vladimir
  surname: Feinberg
  fullname: Feinberg, Vladimir
– sequence: 2
  givenname: Xinyi
  surname: Chen
  fullname: Chen, Xinyi
– sequence: 3
  givenname: Y. Jennifer
  surname: Sun
  fullname: Sun, Y. Jennifer
– sequence: 4
  givenname: Rohan
  surname: Anil
  fullname: Anil, Rohan
– sequence: 5
  givenname: Elad
  surname: Hazan
  fullname: Hazan, Elad
BackLink https://doi.org/10.48550/arXiv.2302.03764$$DView paper in arXiv
BookMark eNotj8FOwzAQRH2AAxQ-gBP-gQQ7ayc2t6pQCipCgt6jTbyhFm1SXLcQvp6mcJrR6Gmkd85O2q4lxq6kSJXRWtxg-Pb7NAORpQKKXJ2xp7cPivWyv-XPtO5Cn1DT-NpTG_nY4Sb6PfFXet-tMPgfjL5r-ZePSz4N9LkbqDsfqB727QU7bXC1pcv_HLHF9H4xmSXzl4fHyXieYF6oREuSlcyFyZwxeQUOjGkQyFiLMtN4qKKoyB6ISpGWToBzUhcETjprFYzY9d_tUabcBL_G0JeDVHmUgl8-gElt
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
EPD
GOX
DOI 10.48550/arxiv.2302.03764
DatabaseName arXiv Computer Science
arXiv Statistics
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2302_03764
GroupedDBID AKY
EPD
GOX
ID FETCH-LOGICAL-a674-51e1b16082d886b3d388fa3e899a125aa3e07be9608b4e51d03dd157e3d1d9943
IEDL.DBID GOX
IngestDate Mon Jan 08 05:45:50 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a674-51e1b16082d886b3d388fa3e899a125aa3e07be9608b4e51d03dd157e3d1d9943
OpenAccessLink https://arxiv.org/abs/2302.03764
ParticipantIDs arxiv_primary_2302_03764
PublicationCentury 2000
PublicationDate 2023-02-07
PublicationDateYYYYMMDD 2023-02-07
PublicationDate_xml – month: 02
  year: 2023
  text: 2023-02-07
  day: 07
PublicationDecade 2020
PublicationYear 2023
Score 1.8766508
SecondaryResourceType preprint
Snippet Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Learning
Statistics - Machine Learning
Title Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
URI https://arxiv.org/abs/2302.03764
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolI_ycFrdLPJbrLeiriWQhW0Qm8laWZBhLWsrdh_7yS7ohdvIcklk2TmTfJmBuASE1VVmKfcZUvFlZaCm8IIvnRaem0sYgyknT7k4xc1mWfzHrCfWBjbfL1-tvmB3cc14eP0KqE7oPrQT9NA2bp_nLefkzEVVzf_dx5hzNj1x0iUe7DboTs2ardjH3pYH8Dk-S3IZnvDpoHWuuUY0zaQtmcjb1dB37CnWBK-6YIiWXgdZWUTac5r1qklOh-HMCvvZrdj3pUw4DbXimcChRM5mVlvTO6kl8ZUViI5OZaQhaVmoh2SF2Gcwkz4RHovMo3SC18USh7BoH6vcQiswswW5N1Y8m9U4lIn5dIIDJQoR6DBHMMwLnyxarNULIJMFlEmJ_8PncJOqJ8eacj6DAbrZoPnZGXX7iKK-htn7Xyo
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sketchy%3A+Memory-efficient+Adaptive+Regularization+with+Frequent+Directions&rft.au=Feinberg%2C+Vladimir&rft.au=Chen%2C+Xinyi&rft.au=Sun%2C+Y.+Jennifer&rft.au=Anil%2C+Rohan&rft.date=2023-02-07&rft_id=info:doi/10.48550%2Farxiv.2302.03764&rft.externalDocID=2302_03764