Analyzing inexact hypergradients for bilevel learning

Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are r...

Full description

Saved in:
Bibliographic Details
Published inIMA journal of applied mathematics Vol. 89; no. 1; pp. 254 - 278
Main Authors Ehrhardt, Matthias J, Roberts, Lindon
Format Journal Article
LanguageEnglish
Published Oxford University Press 21.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation, showing that these two seemingly disparate approaches are actually tightly connected. Our framework is extremely flexible, allowing its subproblems to be solved with any suitable method, to any degree of accuracy. We derive a priori and computable a posteriori error bounds for all our methods and numerically show that our a posteriori bounds are usually more accurate. Our numerical results also show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
AbstractList Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation, showing that these two seemingly disparate approaches are actually tightly connected. Our framework is extremely flexible, allowing its subproblems to be solved with any suitable method, to any degree of accuracy. We derive a priori and computable a posteriori error bounds for all our methods and numerically show that our a posteriori bounds are usually more accurate. Our numerical results also show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
Author Ehrhardt, Matthias J
Roberts, Lindon
Author_xml – sequence: 1
  givenname: Matthias J
  orcidid: 0000-0001-8523-353X
  surname: Ehrhardt
  fullname: Ehrhardt, Matthias J
  email: m.ehrhardt@bath.ac.uk
– sequence: 2
  givenname: Lindon
  orcidid: 0000-0001-6438-9703
  surname: Roberts
  fullname: Roberts, Lindon
  email: lindon.roberts@sydney.edu.au
BookMark eNqFzz1PwzAQgGELFYm2sDJnZUhr-xwnHquKL6kSC8zRxXFaI9eJHIMafj1B6YSEmG65907Pgsx86w0ht4yuGFWwtkc8YlwfTlhTyC7InAkpUpAgZmROec5ToSS9Iou-f6eUsiync5JtPLrhy_p9Yr05oY7JYehM2AesrfGxT5o2JJV15tO4xBkMfty9JpcNut7cnOeSvD3cv26f0t3L4_N2s0s1MIgpqIbLWnHkqKtcZZrlheC8klJlxlAsCg2ac2ykLAAqrJVmheKARtPRALAkq-muDm3fB9OUXRiZYSgZLX_Q5YQuz-gxEL8CbSNG2_oY0Lq_s7spaz-6_158AyTlb-8
CitedBy_id crossref_primary_10_1088_1361_6420_ad797a
Cites_doi 10.1007/978-3-319-18461-6_52
10.1561/2000000111
10.1080/10556789408805572
10.1137/20M1387341
10.1007/978-3-662-45827-3
10.1162/089976600300015187
10.1017/S096249291600009X
10.1007/s10851-021-01020-8
10.1007/978-3-642-25566-3_40
10.1137/120882706
10.3934/ipi.2013.7.1183
10.1109/TMI.2020.3017353
10.1137/19M1291832
10.1162/neco_a_01547
10.1007/978-1-4419-8853-9
ContentType Journal Article
Copyright The Author(s) 2023. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. 2023
Copyright_xml – notice: The Author(s) 2023. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. 2023
DBID TOX
AAYXX
CITATION
DOI 10.1093/imamat/hxad035
DatabaseName Oxford Journals Open Access Collection
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

Database_xml – sequence: 1
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 1464-3634
EndPage 278
ExternalDocumentID 10_1093_imamat_hxad035
10.1093/imamat/hxad035
GroupedDBID -E4
-~X
.2P
.I3
0R~
18M
1TH
29I
4.4
482
48X
5GY
5VS
5WA
6TJ
70D
8WZ
A6W
AAIJN
AAJKP
AAJQQ
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUAY
AAUQX
AAVAP
AAWDT
ABAZT
ABDBF
ABDFA
ABDPE
ABDTM
ABEFU
ABEJV
ABEUO
ABGNP
ABIME
ABIXL
ABJNI
ABNGD
ABNKS
ABPIB
ABPQP
ABPTD
ABQLI
ABSMQ
ABVGC
ABVLG
ABWST
ABXVV
ABZBJ
ABZEO
ACFRR
ACGFO
ACGFS
ACGOD
ACIWK
ACPQN
ACUFI
ACUHS
ACUKT
ACUTJ
ACUXJ
ACVCV
ACYTK
ACZBC
ADEYI
ADEZT
ADGZP
ADHKW
ADHZD
ADIPN
ADNBA
ADOCK
ADQBN
ADRDM
ADRTK
ADVEK
ADYJX
ADYVW
ADZXQ
AECKG
AEGPL
AEGXH
AEHUL
AEJOX
AEKKA
AEKPW
AEKSI
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFIYH
AFOFC
AFSHK
AFYAG
AGINJ
AGKEF
AGKRT
AGMDO
AGORE
AGQPQ
AGQXC
AGSYK
AHGBF
AHXPO
AI.
AIAGR
AIJHB
AJBYB
AJDVS
AJEEA
AJEUX
AJNCP
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
ALXQX
AMVHM
ANAKG
ANFBD
APIBT
APJGH
APWMN
AQDSO
ASAOO
ASPBG
ATDFG
ATGXG
ATTQO
AVWKF
AXUDD
AZFZN
AZVOD
BAYMD
BCRHZ
BEFXN
BEYMZ
BFFAM
BGNUA
BHONS
BKEBE
BPEOZ
BQUQU
BTQHN
CAG
CDBKE
COF
CS3
CXTWN
CZ4
DAKXR
DFGAJ
DILTD
DU5
D~K
EBS
EE~
EJD
ELUNK
ESX
F9B
FEDTE
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
H13
H5~
HAR
HVGLF
HW0
HZ~
I-F
IOX
J21
JAVBF
JXSIZ
KAQDR
KBUDW
KOP
KSI
KSN
M-Z
M43
MBTAY
N9A
NGC
NMDNZ
NOMLY
NU-
NVLIB
O0~
O9-
OCL
ODMLO
OJQWA
OJZSN
OXVGQ
O~Y
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
Q5Y
QBD
R44
RD5
RIG
RNI
ROL
ROX
ROZ
RUSNO
RW1
RXO
RZF
RZO
T9H
TCN
TJP
TN5
TOX
TUS
UPT
UQL
VH1
WH7
X7H
XOL
YAYTL
YKOAZ
YXANX
ZCG
ZKX
ZY4
~91
AAYXX
CITATION
ID FETCH-LOGICAL-c313t-39f26d92a2acb795c178422b6695ee0a88c3c22af66833bad9c18923aec0d0333
IEDL.DBID TOX
ISSN 0272-4960
IngestDate Thu Apr 24 23:06:03 EDT 2025
Tue Jul 01 01:59:18 EDT 2025
Mon Jun 30 08:34:47 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords automatic differentiation
hyperparameter optimization
bilevel optimization
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution NonCommercial-NoDerivs licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work properly cited. For commercial re-use, please contact journals.permissions@oup.com
https://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c313t-39f26d92a2acb795c178422b6695ee0a88c3c22af66833bad9c18923aec0d0333
ORCID 0000-0001-6438-9703
0000-0001-8523-353X
OpenAccessLink https://dx.doi.org/10.1093/imamat/hxad035
PageCount 25
ParticipantIDs crossref_primary_10_1093_imamat_hxad035
crossref_citationtrail_10_1093_imamat_hxad035
oup_primary_10_1093_imamat_hxad035
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-06-21
PublicationDateYYYYMMDD 2024-06-21
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-06-21
  day: 21
PublicationDecade 2020
PublicationTitle IMA journal of applied mathematics
PublicationYear 2024
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Ji (2024062514521391600_ref19) 2021
Berahas (2024062514521391600_ref3) 2021; 31
Grazzi (2024062514521391600_ref14) 2020
Bengio (2024062514521391600_ref2) 2000; 12
Ochs (2024062514521391600_ref28) 2015
Chambolle (2024062514521391600_ref6) 2016; 25
Hutter (2024062514521391600_ref18) 2011
Li (2024062514521391600_ref21) 2022
Ehrhardt (2024062514521391600_ref12) 2021; 63
Reyes (2024062514521391600_ref30) 2013; 7
Bergstra (2024062514521391600_ref4) 2012; 13
De Reyes (2024062514521391600_ref10) 2021
Yang (2024062514521391600_ref35) 2021
Dempe (2024062514521391600_ref11) 2015
Snoek (2024062514521391600_ref33) 2012
Maclaurin (2024062514521391600_ref22) 2015
Nesterov (2024062514521391600_ref26) 2004
Christianson (2024062514521391600_ref8) 1994; 3
Pedregosa (2024062514521391600_ref29) 2016
Amos (2024062514521391600_ref1) 2017
Cao (2024062514521391600_ref5) 2022
Ghadimi (2024062514521391600_ref13) 2018
Mehmood (2024062514521391600_ref24) 2020
Grazzi (2024062514521391600_ref15) 2021
Suonperä (2024062514521391600_ref34) 2022
McKay (2024062514521391600_ref23) 1979; 21
Mukherjee (2024062514521391600_ref25) 2021
Hoeltgen (2024062514521391600_ref16) 2013
Sherry (2024062514521391600_ref32) 2020; 39
Zucchet (2024062514521391600_ref36) 2022; 34
Crockett (2024062514521391600_ref9) 2022; 15
Chen (2024062514521391600_ref7) 2014
Shaban (2024062514521391600_ref31) 2019
Kunisch (2024062514521391600_ref20) 2013; 6
Nocedal (2024062514521391600_ref27) 2006
Hong (2024062514521391600_ref17) 2023; 33
References_xml – start-page: 654
  volume-title: International Conference on Scale Space and Variational Methods in Computer Vision
  year: 2015
  ident: 2024062514521391600_ref28
  doi: 10.1007/978-3-319-18461-6_52
– start-page: 2951
  volume-title: Advances in Neural Information Processing Systems 25 (NIPS 2012)
  year: 2012
  ident: 2024062514521391600_ref33
– volume: 15
  start-page: 121
  year: 2022
  ident: 2024062514521391600_ref9
  article-title: Bilevel methods for image reconstruction
  publication-title: Foundations and Trends in Signal Processing
  doi: 10.1561/2000000111
– volume: 3
  start-page: 311
  year: 1994
  ident: 2024062514521391600_ref8
  article-title: Reverse accumulation and attractive fixed points
  publication-title: Optim. Methods Softw.
  doi: 10.1080/10556789408805572
– volume: 33
  start-page: 147
  year: 2023
  ident: 2024062514521391600_ref17
  article-title: A two-timescale framework for bilevel optimization: complexity analysis and application to actor-critic
  publication-title: SIAM J. Optim.
  doi: 10.1137/20M1387341
– volume-title: Bilevel Programming Problems-Theory, Algorithms and Applications to Energy Networks
  year: 2015
  ident: 2024062514521391600_ref11
  doi: 10.1007/978-3-662-45827-3
– start-page: 151
  volume-title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
  year: 2013
  ident: 2024062514521391600_ref16
– start-page: 19
  volume-title: 19th Computer Vision Winter Workshop
  year: 2014
  ident: 2024062514521391600_ref7
– volume: 12
  start-page: 1889
  year: 2000
  ident: 2024062514521391600_ref2
  article-title: Gradient-based optimization of hyperparameters
  publication-title: Neural Computation
  doi: 10.1162/089976600300015187
– start-page: 4882
  volume-title: Proceedings of the 38th International Conference on Machine Learning
  year: 2021
  ident: 2024062514521391600_ref19
– volume: 25
  start-page: 161
  year: 2016
  ident: 2024062514521391600_ref6
  article-title: An introduction to continuous optimization for imaging
  publication-title: Acta Numer.
  doi: 10.1017/S096249291600009X
– year: 2022
  ident: 2024062514521391600_ref34
  article-title: Linearly convergent bilevel optimization with single-step inner methods
  publication-title: arXiv:2205.04862
– volume: 63
  start-page: 580
  year: 2021
  ident: 2024062514521391600_ref12
  article-title: Inexact derivative-free optimization for bilevel learning
  publication-title: J. Math. Imaging Vision
  doi: 10.1007/s10851-021-01020-8
– start-page: 507
  volume-title: International Conference on Learning and Intelligent Optimization
  year: 2011
  ident: 2024062514521391600_ref18
  doi: 10.1007/978-3-642-25566-3_40
– volume: 6
  start-page: 938
  year: 2013
  ident: 2024062514521391600_ref20
  article-title: A Bilevel optimization approach for parameter learning in Variational models
  publication-title: SIAM Journal on Imaging Sciences
  doi: 10.1137/120882706
– start-page: 2113
  volume-title: Proceedings of the 32nd International Conference on Machine Learning
  year: 2015
  ident: 2024062514521391600_ref22
– start-page: 1723
  volume-title: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019
  year: 2019
  ident: 2024062514521391600_ref31
– volume: 7
  start-page: 1183
  year: 2013
  ident: 2024062514521391600_ref30
  article-title: Image Denoising: learning the noise model via nonsmooth PDE-constrained optimization
  publication-title: Inverse Problems and Imaging
  doi: 10.3934/ipi.2013.7.1183
– start-page: 1584
  volume-title: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS)
  year: 2020
  ident: 2024062514521391600_ref24
– year: 2018
  ident: 2024062514521391600_ref13
  article-title: Approximation methods for bilevel programming
  publication-title: arXiv preprint arXiv:1802.02246
– volume: 39
  start-page: 4310
  year: 2020
  ident: 2024062514521391600_ref32
  article-title: Learning the sampling pattern for MRI
  publication-title: IEEE Transactions on Medical Imaging
  doi: 10.1109/TMI.2020.3017353
– volume: 21
  start-page: 239
  year: 1979
  ident: 2024062514521391600_ref23
  article-title: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code
  publication-title: Technometrics
– volume-title: Numerical Optimization
  year: 2006
  ident: 2024062514521391600_ref27
– start-page: 146
  volume-title: Proceedings of the 34th International Conference on Machine Learning
  year: 2017
  ident: 2024062514521391600_ref1
– start-page: PartF16814
  volume-title: 37th International Conference on Machine Learning, ICML 2020
  year: 2020
  ident: 2024062514521391600_ref14
– start-page: 3826
  volume-title: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021
  year: 2021
  ident: 2024062514521391600_ref15
– start-page: 1
  volume-title: Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging
  year: 2021
  ident: 2024062514521391600_ref10
– start-page: 7426
  volume-title: Proceedings of the AAAI Conference on Artificial Intelligence
  year: 2022
  ident: 2024062514521391600_ref21
– volume: 31
  start-page: 1489
  year: 2021
  ident: 2024062514521391600_ref3
  article-title: Global convergence rate analysis of a generic line search algorithm with noise
  publication-title: SIAM J. Optim.
  doi: 10.1137/19M1291832
– year: 2022
  ident: 2024062514521391600_ref5
  article-title: First- and second-order high probability complexity bounds for trust-region methods with noisy oracles
  publication-title: arXiv preprint 2205.03667
– start-page: 13670
  volume-title: Advances in Neural Information Processing Systems
  year: 2021
  ident: 2024062514521391600_ref35
– volume: 34
  start-page: 2309
  year: 2022
  ident: 2024062514521391600_ref36
  article-title: Beyond backpropagation: implicit gradients for bilevel optimization
  publication-title: Neural Computation
  doi: 10.1162/neco_a_01547
– volume-title: Introductory Lectures on Convex Optimization
  year: 2004
  ident: 2024062514521391600_ref26
  doi: 10.1007/978-1-4419-8853-9
– start-page: 737
  volume-title: Proceedings of the 33rd International Conference on Machine Learning
  year: 2016
  ident: 2024062514521391600_ref29
– volume-title: NeurIPS 2021 Workshop on Deep Learning and Inverse Problems
  year: 2021
  ident: 2024062514521391600_ref25
– volume: 13
  start-page: 281
  year: 2012
  ident: 2024062514521391600_ref4
  article-title: Random search for hyper-parameter optimization
  publication-title: Journal of Machine Learning Research
SSID ssj0001570
Score 2.3498223
Snippet Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an...
SourceID crossref
oup
SourceType Enrichment Source
Index Database
Publisher
StartPage 254
Title Analyzing inexact hypergradients for bilevel learning
Volume 89
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5SL3oQn_gsiwieQruZJLt7bNVShOqlhd6WSTbbCrVKu0rx1zv7cFFR9D4JzITk-5LMfMPYBSgpUgGaozHIJdKWQrDIXaogSDFFXTzoD-50fyRvx2pciUUvf_jCj6D18IhE3lrTFSZtyMvJCYFzlfzh_bg-c30VlK8pgeCSSHktz_h9-Bf4yUvaPqFJb5ttVTTQ65TrtsPW3HyXbQ5qDdXlHlOFXsgbQYtHTHCFNvOmdGlcTBZFlla29IhveoZ29aubeVX3h8k-G_Vuhld9XjU54BZ8yDhEqdBJJFCgNUGkrB-EUgijdaSca2MYWrBCYKp1CGAwiawfEitDZ9vkEMABa8yf5u6QeaaQYnF-G6WTwgY0RimaNLEyItwPjhj_8D22lQJ43ohiFpc_0RCXsYqrWB2xy9r-udS--NXynEL5h9Hxf4xO2IYg2pAnYwn_lDWyxYs7I9jPTJOtd7rX3V6zWPl3Fquu5Q
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Analyzing+inexact+hypergradients+for+bilevel+learning&rft.jtitle=IMA+journal+of+applied+mathematics&rft.au=Ehrhardt%2C+Matthias+J&rft.au=Roberts%2C+Lindon&rft.date=2024-06-21&rft.pub=Oxford+University+Press&rft.issn=0272-4960&rft.eissn=1464-3634&rft.volume=89&rft.issue=1&rft.spage=254&rft.epage=278&rft_id=info:doi/10.1093%2Fimamat%2Fhxad035&rft.externalDocID=10.1093%2Fimamat%2Fhxad035
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0272-4960&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0272-4960&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0272-4960&client=summon