Analyzing inexact hypergradients for bilevel learning
Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are r...
Saved in:
Published in | IMA journal of applied mathematics Vol. 89; no. 1; pp. 254 - 278 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Oxford University Press
21.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation, showing that these two seemingly disparate approaches are actually tightly connected. Our framework is extremely flexible, allowing its subproblems to be solved with any suitable method, to any degree of accuracy. We derive a priori and computable a posteriori error bounds for all our methods and numerically show that our a posteriori bounds are usually more accurate. Our numerical results also show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver. |
---|---|
AbstractList | Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation, showing that these two seemingly disparate approaches are actually tightly connected. Our framework is extremely flexible, allowing its subproblems to be solved with any suitable method, to any degree of accuracy. We derive a priori and computable a posteriori error bounds for all our methods and numerically show that our a posteriori bounds are usually more accurate. Our numerical results also show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver. |
Author | Ehrhardt, Matthias J Roberts, Lindon |
Author_xml | – sequence: 1 givenname: Matthias J orcidid: 0000-0001-8523-353X surname: Ehrhardt fullname: Ehrhardt, Matthias J email: m.ehrhardt@bath.ac.uk – sequence: 2 givenname: Lindon orcidid: 0000-0001-6438-9703 surname: Roberts fullname: Roberts, Lindon email: lindon.roberts@sydney.edu.au |
BookMark | eNqFzz1PwzAQgGELFYm2sDJnZUhr-xwnHquKL6kSC8zRxXFaI9eJHIMafj1B6YSEmG65907Pgsx86w0ht4yuGFWwtkc8YlwfTlhTyC7InAkpUpAgZmROec5ToSS9Iou-f6eUsiync5JtPLrhy_p9Yr05oY7JYehM2AesrfGxT5o2JJV15tO4xBkMfty9JpcNut7cnOeSvD3cv26f0t3L4_N2s0s1MIgpqIbLWnHkqKtcZZrlheC8klJlxlAsCg2ac2ykLAAqrJVmheKARtPRALAkq-muDm3fB9OUXRiZYSgZLX_Q5YQuz-gxEL8CbSNG2_oY0Lq_s7spaz-6_158AyTlb-8 |
CitedBy_id | crossref_primary_10_1088_1361_6420_ad797a |
Cites_doi | 10.1007/978-3-319-18461-6_52 10.1561/2000000111 10.1080/10556789408805572 10.1137/20M1387341 10.1007/978-3-662-45827-3 10.1162/089976600300015187 10.1017/S096249291600009X 10.1007/s10851-021-01020-8 10.1007/978-3-642-25566-3_40 10.1137/120882706 10.3934/ipi.2013.7.1183 10.1109/TMI.2020.3017353 10.1137/19M1291832 10.1162/neco_a_01547 10.1007/978-1-4419-8853-9 |
ContentType | Journal Article |
Copyright | The Author(s) 2023. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. 2023 |
Copyright_xml | – notice: The Author(s) 2023. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. 2023 |
DBID | TOX AAYXX CITATION |
DOI | 10.1093/imamat/hxad035 |
DatabaseName | Oxford Journals Open Access Collection CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
Database_xml | – sequence: 1 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Mathematics |
EISSN | 1464-3634 |
EndPage | 278 |
ExternalDocumentID | 10_1093_imamat_hxad035 10.1093/imamat/hxad035 |
GroupedDBID | -E4 -~X .2P .I3 0R~ 18M 1TH 29I 4.4 482 48X 5GY 5VS 5WA 6TJ 70D 8WZ A6W AAIJN AAJKP AAJQQ AAMVS AAOGV AAPQZ AAPXW AARHZ AAUAY AAUQX AAVAP AAWDT ABAZT ABDBF ABDFA ABDPE ABDTM ABEFU ABEJV ABEUO ABGNP ABIME ABIXL ABJNI ABNGD ABNKS ABPIB ABPQP ABPTD ABQLI ABSMQ ABVGC ABVLG ABWST ABXVV ABZBJ ABZEO ACFRR ACGFO ACGFS ACGOD ACIWK ACPQN ACUFI ACUHS ACUKT ACUTJ ACUXJ ACVCV ACYTK ACZBC ADEYI ADEZT ADGZP ADHKW ADHZD ADIPN ADNBA ADOCK ADQBN ADRDM ADRTK ADVEK ADYJX ADYVW ADZXQ AECKG AEGPL AEGXH AEHUL AEJOX AEKKA AEKPW AEKSI AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFIYH AFOFC AFSHK AFYAG AGINJ AGKEF AGKRT AGMDO AGORE AGQPQ AGQXC AGSYK AHGBF AHXPO AI. AIAGR AIJHB AJBYB AJDVS AJEEA AJEUX AJNCP ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC ALXQX AMVHM ANAKG ANFBD APIBT APJGH APWMN AQDSO ASAOO ASPBG ATDFG ATGXG ATTQO AVWKF AXUDD AZFZN AZVOD BAYMD BCRHZ BEFXN BEYMZ BFFAM BGNUA BHONS BKEBE BPEOZ BQUQU BTQHN CAG CDBKE COF CS3 CXTWN CZ4 DAKXR DFGAJ DILTD DU5 D~K EBS EE~ EJD ELUNK ESX F9B FEDTE FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC H13 H5~ HAR HVGLF HW0 HZ~ I-F IOX J21 JAVBF JXSIZ KAQDR KBUDW KOP KSI KSN M-Z M43 MBTAY N9A NGC NMDNZ NOMLY NU- NVLIB O0~ O9- OCL ODMLO OJQWA OJZSN OXVGQ O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y QBD R44 RD5 RIG RNI ROL ROX ROZ RUSNO RW1 RXO RZF RZO T9H TCN TJP TN5 TOX TUS UPT UQL VH1 WH7 X7H XOL YAYTL YKOAZ YXANX ZCG ZKX ZY4 ~91 AAYXX CITATION |
ID | FETCH-LOGICAL-c313t-39f26d92a2acb795c178422b6695ee0a88c3c22af66833bad9c18923aec0d0333 |
IEDL.DBID | TOX |
ISSN | 0272-4960 |
IngestDate | Thu Apr 24 23:06:03 EDT 2025 Tue Jul 01 01:59:18 EDT 2025 Mon Jun 30 08:34:47 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | automatic differentiation hyperparameter optimization bilevel optimization |
Language | English |
License | This is an Open Access article distributed under the terms of the Creative Commons Attribution NonCommercial-NoDerivs licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work properly cited. For commercial re-use, please contact journals.permissions@oup.com https://creativecommons.org/licenses/by-nc-nd/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c313t-39f26d92a2acb795c178422b6695ee0a88c3c22af66833bad9c18923aec0d0333 |
ORCID | 0000-0001-6438-9703 0000-0001-8523-353X |
OpenAccessLink | https://dx.doi.org/10.1093/imamat/hxad035 |
PageCount | 25 |
ParticipantIDs | crossref_primary_10_1093_imamat_hxad035 crossref_citationtrail_10_1093_imamat_hxad035 oup_primary_10_1093_imamat_hxad035 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-06-21 |
PublicationDateYYYYMMDD | 2024-06-21 |
PublicationDate_xml | – month: 06 year: 2024 text: 2024-06-21 day: 21 |
PublicationDecade | 2020 |
PublicationTitle | IMA journal of applied mathematics |
PublicationYear | 2024 |
Publisher | Oxford University Press |
Publisher_xml | – name: Oxford University Press |
References | Ji (2024062514521391600_ref19) 2021 Berahas (2024062514521391600_ref3) 2021; 31 Grazzi (2024062514521391600_ref14) 2020 Bengio (2024062514521391600_ref2) 2000; 12 Ochs (2024062514521391600_ref28) 2015 Chambolle (2024062514521391600_ref6) 2016; 25 Hutter (2024062514521391600_ref18) 2011 Li (2024062514521391600_ref21) 2022 Ehrhardt (2024062514521391600_ref12) 2021; 63 Reyes (2024062514521391600_ref30) 2013; 7 Bergstra (2024062514521391600_ref4) 2012; 13 De Reyes (2024062514521391600_ref10) 2021 Yang (2024062514521391600_ref35) 2021 Dempe (2024062514521391600_ref11) 2015 Snoek (2024062514521391600_ref33) 2012 Maclaurin (2024062514521391600_ref22) 2015 Nesterov (2024062514521391600_ref26) 2004 Christianson (2024062514521391600_ref8) 1994; 3 Pedregosa (2024062514521391600_ref29) 2016 Amos (2024062514521391600_ref1) 2017 Cao (2024062514521391600_ref5) 2022 Ghadimi (2024062514521391600_ref13) 2018 Mehmood (2024062514521391600_ref24) 2020 Grazzi (2024062514521391600_ref15) 2021 Suonperä (2024062514521391600_ref34) 2022 McKay (2024062514521391600_ref23) 1979; 21 Mukherjee (2024062514521391600_ref25) 2021 Hoeltgen (2024062514521391600_ref16) 2013 Sherry (2024062514521391600_ref32) 2020; 39 Zucchet (2024062514521391600_ref36) 2022; 34 Crockett (2024062514521391600_ref9) 2022; 15 Chen (2024062514521391600_ref7) 2014 Shaban (2024062514521391600_ref31) 2019 Kunisch (2024062514521391600_ref20) 2013; 6 Nocedal (2024062514521391600_ref27) 2006 Hong (2024062514521391600_ref17) 2023; 33 |
References_xml | – start-page: 654 volume-title: International Conference on Scale Space and Variational Methods in Computer Vision year: 2015 ident: 2024062514521391600_ref28 doi: 10.1007/978-3-319-18461-6_52 – start-page: 2951 volume-title: Advances in Neural Information Processing Systems 25 (NIPS 2012) year: 2012 ident: 2024062514521391600_ref33 – volume: 15 start-page: 121 year: 2022 ident: 2024062514521391600_ref9 article-title: Bilevel methods for image reconstruction publication-title: Foundations and Trends in Signal Processing doi: 10.1561/2000000111 – volume: 3 start-page: 311 year: 1994 ident: 2024062514521391600_ref8 article-title: Reverse accumulation and attractive fixed points publication-title: Optim. Methods Softw. doi: 10.1080/10556789408805572 – volume: 33 start-page: 147 year: 2023 ident: 2024062514521391600_ref17 article-title: A two-timescale framework for bilevel optimization: complexity analysis and application to actor-critic publication-title: SIAM J. Optim. doi: 10.1137/20M1387341 – volume-title: Bilevel Programming Problems-Theory, Algorithms and Applications to Energy Networks year: 2015 ident: 2024062514521391600_ref11 doi: 10.1007/978-3-662-45827-3 – start-page: 151 volume-title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) year: 2013 ident: 2024062514521391600_ref16 – start-page: 19 volume-title: 19th Computer Vision Winter Workshop year: 2014 ident: 2024062514521391600_ref7 – volume: 12 start-page: 1889 year: 2000 ident: 2024062514521391600_ref2 article-title: Gradient-based optimization of hyperparameters publication-title: Neural Computation doi: 10.1162/089976600300015187 – start-page: 4882 volume-title: Proceedings of the 38th International Conference on Machine Learning year: 2021 ident: 2024062514521391600_ref19 – volume: 25 start-page: 161 year: 2016 ident: 2024062514521391600_ref6 article-title: An introduction to continuous optimization for imaging publication-title: Acta Numer. doi: 10.1017/S096249291600009X – year: 2022 ident: 2024062514521391600_ref34 article-title: Linearly convergent bilevel optimization with single-step inner methods publication-title: arXiv:2205.04862 – volume: 63 start-page: 580 year: 2021 ident: 2024062514521391600_ref12 article-title: Inexact derivative-free optimization for bilevel learning publication-title: J. Math. Imaging Vision doi: 10.1007/s10851-021-01020-8 – start-page: 507 volume-title: International Conference on Learning and Intelligent Optimization year: 2011 ident: 2024062514521391600_ref18 doi: 10.1007/978-3-642-25566-3_40 – volume: 6 start-page: 938 year: 2013 ident: 2024062514521391600_ref20 article-title: A Bilevel optimization approach for parameter learning in Variational models publication-title: SIAM Journal on Imaging Sciences doi: 10.1137/120882706 – start-page: 2113 volume-title: Proceedings of the 32nd International Conference on Machine Learning year: 2015 ident: 2024062514521391600_ref22 – start-page: 1723 volume-title: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019 year: 2019 ident: 2024062514521391600_ref31 – volume: 7 start-page: 1183 year: 2013 ident: 2024062514521391600_ref30 article-title: Image Denoising: learning the noise model via nonsmooth PDE-constrained optimization publication-title: Inverse Problems and Imaging doi: 10.3934/ipi.2013.7.1183 – start-page: 1584 volume-title: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) year: 2020 ident: 2024062514521391600_ref24 – year: 2018 ident: 2024062514521391600_ref13 article-title: Approximation methods for bilevel programming publication-title: arXiv preprint arXiv:1802.02246 – volume: 39 start-page: 4310 year: 2020 ident: 2024062514521391600_ref32 article-title: Learning the sampling pattern for MRI publication-title: IEEE Transactions on Medical Imaging doi: 10.1109/TMI.2020.3017353 – volume: 21 start-page: 239 year: 1979 ident: 2024062514521391600_ref23 article-title: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code publication-title: Technometrics – volume-title: Numerical Optimization year: 2006 ident: 2024062514521391600_ref27 – start-page: 146 volume-title: Proceedings of the 34th International Conference on Machine Learning year: 2017 ident: 2024062514521391600_ref1 – start-page: PartF16814 volume-title: 37th International Conference on Machine Learning, ICML 2020 year: 2020 ident: 2024062514521391600_ref14 – start-page: 3826 volume-title: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021 year: 2021 ident: 2024062514521391600_ref15 – start-page: 1 volume-title: Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging year: 2021 ident: 2024062514521391600_ref10 – start-page: 7426 volume-title: Proceedings of the AAAI Conference on Artificial Intelligence year: 2022 ident: 2024062514521391600_ref21 – volume: 31 start-page: 1489 year: 2021 ident: 2024062514521391600_ref3 article-title: Global convergence rate analysis of a generic line search algorithm with noise publication-title: SIAM J. Optim. doi: 10.1137/19M1291832 – year: 2022 ident: 2024062514521391600_ref5 article-title: First- and second-order high probability complexity bounds for trust-region methods with noisy oracles publication-title: arXiv preprint 2205.03667 – start-page: 13670 volume-title: Advances in Neural Information Processing Systems year: 2021 ident: 2024062514521391600_ref35 – volume: 34 start-page: 2309 year: 2022 ident: 2024062514521391600_ref36 article-title: Beyond backpropagation: implicit gradients for bilevel optimization publication-title: Neural Computation doi: 10.1162/neco_a_01547 – volume-title: Introductory Lectures on Convex Optimization year: 2004 ident: 2024062514521391600_ref26 doi: 10.1007/978-1-4419-8853-9 – start-page: 737 volume-title: Proceedings of the 33rd International Conference on Machine Learning year: 2016 ident: 2024062514521391600_ref29 – volume-title: NeurIPS 2021 Workshop on Deep Learning and Inverse Problems year: 2021 ident: 2024062514521391600_ref25 – volume: 13 start-page: 281 year: 2012 ident: 2024062514521391600_ref4 article-title: Random search for hyper-parameter optimization publication-title: Journal of Machine Learning Research |
SSID | ssj0001570 |
Score | 2.3498223 |
Snippet | Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an... |
SourceID | crossref oup |
SourceType | Enrichment Source Index Database Publisher |
StartPage | 254 |
Title | Analyzing inexact hypergradients for bilevel learning |
Volume | 89 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5SL3oQn_gsiwieQruZJLt7bNVShOqlhd6WSTbbCrVKu0rx1zv7cFFR9D4JzITk-5LMfMPYBSgpUgGaozHIJdKWQrDIXaogSDFFXTzoD-50fyRvx2pciUUvf_jCj6D18IhE3lrTFSZtyMvJCYFzlfzh_bg-c30VlK8pgeCSSHktz_h9-Bf4yUvaPqFJb5ttVTTQ65TrtsPW3HyXbQ5qDdXlHlOFXsgbQYtHTHCFNvOmdGlcTBZFlla29IhveoZ29aubeVX3h8k-G_Vuhld9XjU54BZ8yDhEqdBJJFCgNUGkrB-EUgijdaSca2MYWrBCYKp1CGAwiawfEitDZ9vkEMABa8yf5u6QeaaQYnF-G6WTwgY0RimaNLEyItwPjhj_8D22lQJ43ohiFpc_0RCXsYqrWB2xy9r-udS--NXynEL5h9Hxf4xO2IYg2pAnYwn_lDWyxYs7I9jPTJOtd7rX3V6zWPl3Fquu5Q |
linkProvider | Oxford University Press |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Analyzing+inexact+hypergradients+for+bilevel+learning&rft.jtitle=IMA+journal+of+applied+mathematics&rft.au=Ehrhardt%2C+Matthias+J&rft.au=Roberts%2C+Lindon&rft.date=2024-06-21&rft.pub=Oxford+University+Press&rft.issn=0272-4960&rft.eissn=1464-3634&rft.volume=89&rft.issue=1&rft.spage=254&rft.epage=278&rft_id=info:doi/10.1093%2Fimamat%2Fhxad035&rft.externalDocID=10.1093%2Fimamat%2Fhxad035 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0272-4960&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0272-4960&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0272-4960&client=summon |