Analyzing inexact hypergradients for bilevel learning

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are r...

Full description

Saved in:

Bibliographic Details
Published in	IMA journal of applied mathematics Vol. 89; no. 1; pp. 254 - 278
Main Authors	Ehrhardt, Matthias J, Roberts, Lindon
Format	Journal Article
Language	English
Published	Oxford University Press 21.06.2024
Subjects	automatic differentiation hyperparameter optimization bilevel optimization
Online Access	Get full text

Cover

Loading…

Abstract	Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation, showing that these two seemingly disparate approaches are actually tightly connected. Our framework is extremely flexible, allowing its subproblems to be solved with any suitable method, to any degree of accuracy. We derive a priori and computable a posteriori error bounds for all our methods and numerically show that our a posteriori bounds are usually more accurate. Our numerical results also show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
AbstractList	Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation, showing that these two seemingly disparate approaches are actually tightly connected. Our framework is extremely flexible, allowing its subproblems to be solved with any suitable method, to any degree of accuracy. We derive a priori and computable a posteriori error bounds for all our methods and numerically show that our a posteriori bounds are usually more accurate. Our numerical results also show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
Author	Ehrhardt, Matthias J Roberts, Lindon
Author_xml	– sequence: 1 givenname: Matthias J orcidid: 0000-0001-8523-353X surname: Ehrhardt fullname: Ehrhardt, Matthias J email: m.ehrhardt@bath.ac.uk – sequence: 2 givenname: Lindon orcidid: 0000-0001-6438-9703 surname: Roberts fullname: Roberts, Lindon email: lindon.roberts@sydney.edu.au
BookMark	eNqFzz1PwzAQgGELFYm2sDJnZUhr-xwnHquKL6kSC8zRxXFaI9eJHIMafj1B6YSEmG65907Pgsx86w0ht4yuGFWwtkc8YlwfTlhTyC7InAkpUpAgZmROec5ToSS9Iou-f6eUsiync5JtPLrhy_p9Yr05oY7JYehM2AesrfGxT5o2JJV15tO4xBkMfty9JpcNut7cnOeSvD3cv26f0t3L4_N2s0s1MIgpqIbLWnHkqKtcZZrlheC8klJlxlAsCg2ac2ykLAAqrJVmheKARtPRALAkq-muDm3fB9OUXRiZYSgZLX_Q5YQuz-gxEL8CbSNG2_oY0Lq_s7spaz-6_158AyTlb-8
CitedBy_id	crossref_primary_10_1088_1361_6420_ad797a
Cites_doi	10.1007/978-3-319-18461-6_52 10.1561/2000000111 10.1080/10556789408805572 10.1137/20M1387341 10.1007/978-3-662-45827-3 10.1162/089976600300015187 10.1017/S096249291600009X 10.1007/s10851-021-01020-8 10.1007/978-3-642-25566-3_40 10.1137/120882706 10.3934/ipi.2013.7.1183 10.1109/TMI.2020.3017353 10.1137/19M1291832 10.1162/neco_a_01547 10.1007/978-1-4419-8853-9
ContentType	Journal Article
Copyright	The Author(s) 2023. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. 2023
Copyright_xml	– notice: The Author(s) 2023. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. 2023
DBID	TOX AAYXX CITATION
DOI	10.1093/imamat/hxad035
DatabaseName	Oxford Journals Open Access Collection CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
Database_xml	– sequence: 1 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Mathematics
EISSN	1464-3634
EndPage	278
ExternalDocumentID	10_1093_imamat_hxad035 10.1093/imamat/hxad035
GroupedDBID	-E4 -~X .2P .I3 0R~ 18M 1TH 29I 4.4 482 48X 5GY 5VS 5WA 6TJ 70D 8WZ A6W AAIJN AAJKP AAJQQ AAMVS AAOGV AAPQZ AAPXW AARHZ AAUAY AAUQX AAVAP AAWDT ABAZT ABDBF ABDFA ABDPE ABDTM ABEFU ABEJV ABEUO ABGNP ABIME ABIXL ABJNI ABNGD ABNKS ABPIB ABPQP ABPTD ABQLI ABSMQ ABVGC ABVLG ABWST ABXVV ABZBJ ABZEO ACFRR ACGFO ACGFS ACGOD ACIWK ACPQN ACUFI ACUHS ACUKT ACUTJ ACUXJ ACVCV ACYTK ACZBC ADEYI ADEZT ADGZP ADHKW ADHZD ADIPN ADNBA ADOCK ADQBN ADRDM ADRTK ADVEK ADYJX ADYVW ADZXQ AECKG AEGPL AEGXH AEHUL AEJOX AEKKA AEKPW AEKSI AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFIYH AFOFC AFSHK AFYAG AGINJ AGKEF AGKRT AGMDO AGORE AGQPQ AGQXC AGSYK AHGBF AHXPO AI. AIAGR AIJHB AJBYB AJDVS AJEEA AJEUX AJNCP ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC ALXQX AMVHM ANAKG ANFBD APIBT APJGH APWMN AQDSO ASAOO ASPBG ATDFG ATGXG ATTQO AVWKF AXUDD AZFZN AZVOD BAYMD BCRHZ BEFXN BEYMZ BFFAM BGNUA BHONS BKEBE BPEOZ BQUQU BTQHN CAG CDBKE COF CS3 CXTWN CZ4 DAKXR DFGAJ DILTD DU5 D~K EBS EE~ EJD ELUNK ESX F9B FEDTE FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC H13 H5~ HAR HVGLF HW0 HZ~ I-F IOX J21 JAVBF JXSIZ KAQDR KBUDW KOP KSI KSN M-Z M43 MBTAY N9A NGC NMDNZ NOMLY NU- NVLIB O0~ O9- OCL ODMLO OJQWA OJZSN OXVGQ O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y QBD R44 RD5 RIG RNI ROL ROX ROZ RUSNO RW1 RXO RZF RZO T9H TCN TJP TN5 TOX TUS UPT UQL VH1 WH7 X7H XOL YAYTL YKOAZ YXANX ZCG ZKX ZY4 ~91 AAYXX CITATION
ID	FETCH-LOGICAL-c313t-39f26d92a2acb795c178422b6695ee0a88c3c22af66833bad9c18923aec0d0333
IEDL.DBID	TOX
ISSN	0272-4960
IngestDate	Thu Apr 24 23:06:03 EDT 2025 Tue Jul 01 01:59:18 EDT 2025 Mon Jun 30 08:34:47 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	automatic differentiation hyperparameter optimization bilevel optimization
Language	English
License	This is an Open Access article distributed under the terms of the Creative Commons Attribution NonCommercial-NoDerivs licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work properly cited. For commercial re-use, please contact journals.permissions@oup.com https://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c313t-39f26d92a2acb795c178422b6695ee0a88c3c22af66833bad9c18923aec0d0333
ORCID	0000-0001-6438-9703 0000-0001-8523-353X
OpenAccessLink	https://dx.doi.org/10.1093/imamat/hxad035
PageCount	25
ParticipantIDs	crossref_primary_10_1093_imamat_hxad035 crossref_citationtrail_10_1093_imamat_hxad035 oup_primary_10_1093_imamat_hxad035
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-06-21
PublicationDateYYYYMMDD	2024-06-21
PublicationDate_xml	– month: 06 year: 2024 text: 2024-06-21 day: 21
PublicationDecade	2020
PublicationTitle	IMA journal of applied mathematics
PublicationYear	2024
Publisher	Oxford University Press
Publisher_xml	– name: Oxford University Press
References	Ji (2024062514521391600_ref19) 2021 Berahas (2024062514521391600_ref3) 2021; 31 Grazzi (2024062514521391600_ref14) 2020 Bengio (2024062514521391600_ref2) 2000; 12 Ochs (2024062514521391600_ref28) 2015 Chambolle (2024062514521391600_ref6) 2016; 25 Hutter (2024062514521391600_ref18) 2011 Li (2024062514521391600_ref21) 2022 Ehrhardt (2024062514521391600_ref12) 2021; 63 Reyes (2024062514521391600_ref30) 2013; 7 Bergstra (2024062514521391600_ref4) 2012; 13 De Reyes (2024062514521391600_ref10) 2021 Yang (2024062514521391600_ref35) 2021 Dempe (2024062514521391600_ref11) 2015 Snoek (2024062514521391600_ref33) 2012 Maclaurin (2024062514521391600_ref22) 2015 Nesterov (2024062514521391600_ref26) 2004 Christianson (2024062514521391600_ref8) 1994; 3 Pedregosa (2024062514521391600_ref29) 2016 Amos (2024062514521391600_ref1) 2017 Cao (2024062514521391600_ref5) 2022 Ghadimi (2024062514521391600_ref13) 2018 Mehmood (2024062514521391600_ref24) 2020 Grazzi (2024062514521391600_ref15) 2021 Suonperä (2024062514521391600_ref34) 2022 McKay (2024062514521391600_ref23) 1979; 21 Mukherjee (2024062514521391600_ref25) 2021 Hoeltgen (2024062514521391600_ref16) 2013 Sherry (2024062514521391600_ref32) 2020; 39 Zucchet (2024062514521391600_ref36) 2022; 34 Crockett (2024062514521391600_ref9) 2022; 15 Chen (2024062514521391600_ref7) 2014 Shaban (2024062514521391600_ref31) 2019 Kunisch (2024062514521391600_ref20) 2013; 6 Nocedal (2024062514521391600_ref27) 2006 Hong (2024062514521391600_ref17) 2023; 33
References_xml	– start-page: 654 volume-title: International Conference on Scale Space and Variational Methods in Computer Vision year: 2015 ident: 2024062514521391600_ref28 doi: 10.1007/978-3-319-18461-6_52 – start-page: 2951 volume-title: Advances in Neural Information Processing Systems 25 (NIPS 2012) year: 2012 ident: 2024062514521391600_ref33 – volume: 15 start-page: 121 year: 2022 ident: 2024062514521391600_ref9 article-title: Bilevel methods for image reconstruction publication-title: Foundations and Trends in Signal Processing doi: 10.1561/2000000111 – volume: 3 start-page: 311 year: 1994 ident: 2024062514521391600_ref8 article-title: Reverse accumulation and attractive fixed points publication-title: Optim. Methods Softw. doi: 10.1080/10556789408805572 – volume: 33 start-page: 147 year: 2023 ident: 2024062514521391600_ref17 article-title: A two-timescale framework for bilevel optimization: complexity analysis and application to actor-critic publication-title: SIAM J. Optim. doi: 10.1137/20M1387341 – volume-title: Bilevel Programming Problems-Theory, Algorithms and Applications to Energy Networks year: 2015 ident: 2024062514521391600_ref11 doi: 10.1007/978-3-662-45827-3 – start-page: 151 volume-title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) year: 2013 ident: 2024062514521391600_ref16 – start-page: 19 volume-title: 19th Computer Vision Winter Workshop year: 2014 ident: 2024062514521391600_ref7 – volume: 12 start-page: 1889 year: 2000 ident: 2024062514521391600_ref2 article-title: Gradient-based optimization of hyperparameters publication-title: Neural Computation doi: 10.1162/089976600300015187 – start-page: 4882 volume-title: Proceedings of the 38th International Conference on Machine Learning year: 2021 ident: 2024062514521391600_ref19 – volume: 25 start-page: 161 year: 2016 ident: 2024062514521391600_ref6 article-title: An introduction to continuous optimization for imaging publication-title: Acta Numer. doi: 10.1017/S096249291600009X – year: 2022 ident: 2024062514521391600_ref34 article-title: Linearly convergent bilevel optimization with single-step inner methods publication-title: arXiv:2205.04862 – volume: 63 start-page: 580 year: 2021 ident: 2024062514521391600_ref12 article-title: Inexact derivative-free optimization for bilevel learning publication-title: J. Math. Imaging Vision doi: 10.1007/s10851-021-01020-8 – start-page: 507 volume-title: International Conference on Learning and Intelligent Optimization year: 2011 ident: 2024062514521391600_ref18 doi: 10.1007/978-3-642-25566-3_40 – volume: 6 start-page: 938 year: 2013 ident: 2024062514521391600_ref20 article-title: A Bilevel optimization approach for parameter learning in Variational models publication-title: SIAM Journal on Imaging Sciences doi: 10.1137/120882706 – start-page: 2113 volume-title: Proceedings of the 32nd International Conference on Machine Learning year: 2015 ident: 2024062514521391600_ref22 – start-page: 1723 volume-title: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019 year: 2019 ident: 2024062514521391600_ref31 – volume: 7 start-page: 1183 year: 2013 ident: 2024062514521391600_ref30 article-title: Image Denoising: learning the noise model via nonsmooth PDE-constrained optimization publication-title: Inverse Problems and Imaging doi: 10.3934/ipi.2013.7.1183 – start-page: 1584 volume-title: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) year: 2020 ident: 2024062514521391600_ref24 – year: 2018 ident: 2024062514521391600_ref13 article-title: Approximation methods for bilevel programming publication-title: arXiv preprint arXiv:1802.02246 – volume: 39 start-page: 4310 year: 2020 ident: 2024062514521391600_ref32 article-title: Learning the sampling pattern for MRI publication-title: IEEE Transactions on Medical Imaging doi: 10.1109/TMI.2020.3017353 – volume: 21 start-page: 239 year: 1979 ident: 2024062514521391600_ref23 article-title: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code publication-title: Technometrics – volume-title: Numerical Optimization year: 2006 ident: 2024062514521391600_ref27 – start-page: 146 volume-title: Proceedings of the 34th International Conference on Machine Learning year: 2017 ident: 2024062514521391600_ref1 – start-page: PartF16814 volume-title: 37th International Conference on Machine Learning, ICML 2020 year: 2020 ident: 2024062514521391600_ref14 – start-page: 3826 volume-title: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021 year: 2021 ident: 2024062514521391600_ref15 – start-page: 1 volume-title: Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging year: 2021 ident: 2024062514521391600_ref10 – start-page: 7426 volume-title: Proceedings of the AAAI Conference on Artificial Intelligence year: 2022 ident: 2024062514521391600_ref21 – volume: 31 start-page: 1489 year: 2021 ident: 2024062514521391600_ref3 article-title: Global convergence rate analysis of a generic line search algorithm with noise publication-title: SIAM J. Optim. doi: 10.1137/19M1291832 – year: 2022 ident: 2024062514521391600_ref5 article-title: First- and second-order high probability complexity bounds for trust-region methods with noisy oracles publication-title: arXiv preprint 2205.03667 – start-page: 13670 volume-title: Advances in Neural Information Processing Systems year: 2021 ident: 2024062514521391600_ref35 – volume: 34 start-page: 2309 year: 2022 ident: 2024062514521391600_ref36 article-title: Beyond backpropagation: implicit gradients for bilevel optimization publication-title: Neural Computation doi: 10.1162/neco_a_01547 – volume-title: Introductory Lectures on Convex Optimization year: 2004 ident: 2024062514521391600_ref26 doi: 10.1007/978-1-4419-8853-9 – start-page: 737 volume-title: Proceedings of the 33rd International Conference on Machine Learning year: 2016 ident: 2024062514521391600_ref29 – volume-title: NeurIPS 2021 Workshop on Deep Learning and Inverse Problems year: 2021 ident: 2024062514521391600_ref25 – volume: 13 start-page: 281 year: 2012 ident: 2024062514521391600_ref4 article-title: Random search for hyper-parameter optimization publication-title: Journal of Machine Learning Research
SSID	ssj0001570
Score	2.3498223
Snippet	Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an...
SourceID	crossref oup
SourceType	Enrichment Source Index Database Publisher
StartPage	254
Title	Analyzing inexact hypergradients for bilevel learning
Volume	89
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5SL3oQn_gsiwieQruZJLt7bNVShOqlhd6WSTbbCrVKu0rx1zv7cFFR9D4JzITk-5LMfMPYBSgpUgGaozHIJdKWQrDIXaogSDFFXTzoD-50fyRvx2pciUUvf_jCj6D18IhE3lrTFSZtyMvJCYFzlfzh_bg-c30VlK8pgeCSSHktz_h9-Bf4yUvaPqFJb5ttVTTQ65TrtsPW3HyXbQ5qDdXlHlOFXsgbQYtHTHCFNvOmdGlcTBZFlla29IhveoZ29aubeVX3h8k-G_Vuhld9XjU54BZ8yDhEqdBJJFCgNUGkrB-EUgijdaSca2MYWrBCYKp1CGAwiawfEitDZ9vkEMABa8yf5u6QeaaQYnF-G6WTwgY0RimaNLEyItwPjhj_8D22lQJ43ohiFpc_0RCXsYqrWB2xy9r-udS--NXynEL5h9Hxf4xO2IYg2pAnYwn_lDWyxYs7I9jPTJOtd7rX3V6zWPl3Fquu5Q
linkProvider	Oxford University Press
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Analyzing+inexact+hypergradients+for+bilevel+learning&rft.jtitle=IMA+journal+of+applied+mathematics&rft.au=Ehrhardt%2C+Matthias+J&rft.au=Roberts%2C+Lindon&rft.date=2024-06-21&rft.pub=Oxford+University+Press&rft.issn=0272-4960&rft.eissn=1464-3634&rft.volume=89&rft.issue=1&rft.spage=254&rft.epage=278&rft_id=info:doi/10.1093%2Fimamat%2Fhxad035&rft.externalDocID=10.1093%2Fimamat%2Fhxad035
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0272-4960&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0272-4960&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0272-4960&client=summon