Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions

Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks a...

Full description

Saved in:

Bibliographic Details
Main Authors	Feinberg, Vladimir, Chen, Xinyi, Sun, Y. Jennifer, Anil, Rohan, Hazan, Elad
Format	Journal Article
Language	English
Published	07.02.2023
Subjects	Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning
Online Access	Get full text

Cover

Loading…

Abstract	Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace that changes throughout training, motivating a low-rank sketching approach. We describe a generic method for reducing memory and compute requirements of maintaining a matrix preconditioner using the Frequent Directions (FD) sketch. While previous approaches have explored applying FD for second-order optimization, we present a novel analysis which allows efficient interpolation between resource requirements and the degradation in regret guarantees with rank $k$: in the online convex optimization (OCO) setting over dimension $d$, we match full-matrix $d^2$ memory regret using only $dk$ memory up to additive error in the bottom $d-k$ eigenvalues of the gradient covariance. Further, we show extensions of our work to Shampoo, resulting in a method competitive in quality with Shampoo and Adam, yet requiring only sub-linear memory for tracking second moments.
AbstractList	Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace that changes throughout training, motivating a low-rank sketching approach. We describe a generic method for reducing memory and compute requirements of maintaining a matrix preconditioner using the Frequent Directions (FD) sketch. While previous approaches have explored applying FD for second-order optimization, we present a novel analysis which allows efficient interpolation between resource requirements and the degradation in regret guarantees with rank $k$: in the online convex optimization (OCO) setting over dimension $d$, we match full-matrix $d^2$ memory regret using only $dk$ memory up to additive error in the bottom $d-k$ eigenvalues of the gradient covariance. Further, we show extensions of our work to Shampoo, resulting in a method competitive in quality with Shampoo and Adam, yet requiring only sub-linear memory for tracking second moments.
Author	Anil, Rohan Chen, Xinyi Sun, Y. Jennifer Feinberg, Vladimir Hazan, Elad
Author_xml	– sequence: 1 givenname: Vladimir surname: Feinberg fullname: Feinberg, Vladimir – sequence: 2 givenname: Xinyi surname: Chen fullname: Chen, Xinyi – sequence: 3 givenname: Y. Jennifer surname: Sun fullname: Sun, Y. Jennifer – sequence: 4 givenname: Rohan surname: Anil fullname: Anil, Rohan – sequence: 5 givenname: Elad surname: Hazan fullname: Hazan, Elad
BackLink	https://doi.org/10.48550/arXiv.2302.03764$$DView paper in arXiv
BookMark	eNotj8FOwzAQRH2AAxQ-gBP-gQQ7ayc2t6pQCipCgt6jTbyhFm1SXLcQvp6mcJrR6Gmkd85O2q4lxq6kSJXRWtxg-Pb7NAORpQKKXJ2xp7cPivWyv-XPtO5Cn1DT-NpTG_nY4Sb6PfFXet-tMPgfjL5r-ZePSz4N9LkbqDsfqB727QU7bXC1pcv_HLHF9H4xmSXzl4fHyXieYF6oREuSlcyFyZwxeQUOjGkQyFiLMtN4qKKoyB6ISpGWToBzUhcETjprFYzY9d_tUabcBL_G0JeDVHmUgl8-gElt
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by/4.0
DBID	AKY EPD GOX
DOI	10.48550/arxiv.2302.03764
DatabaseName	arXiv Computer Science arXiv Statistics arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2302_03764
GroupedDBID	AKY EPD GOX
ID	FETCH-LOGICAL-a674-51e1b16082d886b3d388fa3e899a125aa3e07be9608b4e51d03dd157e3d1d9943
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:45:50 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a674-51e1b16082d886b3d388fa3e899a125aa3e07be9608b4e51d03dd157e3d1d9943
OpenAccessLink	https://arxiv.org/abs/2302.03764
ParticipantIDs	arxiv_primary_2302_03764
PublicationCentury	2000
PublicationDate	2023-02-07
PublicationDateYYYYMMDD	2023-02-07
PublicationDate_xml	– month: 02 year: 2023 text: 2023-02-07 day: 07
PublicationDecade	2020
PublicationYear	2023
Score	1.8766508
SecondaryResourceType	preprint
Snippet	Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning
Title	Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
URI	https://arxiv.org/abs/2302.03764
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolI_ycFrdLPJbrLeiriWQhW0Qm8laWZBhLWsrdh_7yS7ohdvIcklk2TmTfJmBuASE1VVmKfcZUvFlZaCm8IIvnRaem0sYgyknT7k4xc1mWfzHrCfWBjbfL1-tvmB3cc14eP0KqE7oPrQT9NA2bp_nLefkzEVVzf_dx5hzNj1x0iUe7DboTs2ardjH3pYH8Dk-S3IZnvDpoHWuuUY0zaQtmcjb1dB37CnWBK-6YIiWXgdZWUTac5r1qklOh-HMCvvZrdj3pUw4DbXimcChRM5mVlvTO6kl8ZUViI5OZaQhaVmoh2SF2Gcwkz4RHovMo3SC18USh7BoH6vcQiswswW5N1Y8m9U4lIn5dIIDJQoR6DBHMMwLnyxarNULIJMFlEmJ_8PncJOqJ8eacj6DAbrZoPnZGXX7iKK-htn7Xyo
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sketchy%3A+Memory-efficient+Adaptive+Regularization+with+Frequent+Directions&rft.au=Feinberg%2C+Vladimir&rft.au=Chen%2C+Xinyi&rft.au=Sun%2C+Y.+Jennifer&rft.au=Anil%2C+Rohan&rft.date=2023-02-07&rft_id=info:doi/10.48550%2Farxiv.2302.03764&rft.externalDocID=2302_03764