R\'{e}nyi Divergence Deep Mutual Learning

This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla D...

Full description

Saved in:
Bibliographic Details
Main Authors Huang, Weipeng, Tao, Junjie, Deng, Changbo, Fan, Ming, Wan, Wenqiang, Xiong, Qi, Piao, Guangyuan
Format Journal Article
LanguageEnglish
Published 13.09.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and R\'{e}nyi divergence, leading to further improvement in model generalization.
AbstractList This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and R\'{e}nyi divergence, leading to further improvement in model generalization.
Author Piao, Guangyuan
Deng, Changbo
Huang, Weipeng
Tao, Junjie
Wan, Wenqiang
Xiong, Qi
Fan, Ming
Author_xml – sequence: 1
  givenname: Weipeng
  surname: Huang
  fullname: Huang, Weipeng
– sequence: 2
  givenname: Junjie
  surname: Tao
  fullname: Tao, Junjie
– sequence: 3
  givenname: Changbo
  surname: Deng
  fullname: Deng, Changbo
– sequence: 4
  givenname: Ming
  surname: Fan
  fullname: Fan, Ming
– sequence: 5
  givenname: Wenqiang
  surname: Wan
  fullname: Wan, Wenqiang
– sequence: 6
  givenname: Qi
  surname: Xiong
  fullname: Xiong, Qi
– sequence: 7
  givenname: Guangyuan
  surname: Piao
  fullname: Piao, Guangyuan
BackLink https://doi.org/10.48550/arXiv.2209.05732$$DView paper in arXiv
BookMark eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIysNQzMDU3NuJk0AyKUa9Orc2rzFRwySxLLUpPzUtOVXBJTS1Q8C0tKU3MUfBJTSzKy8xL52FgTUvMKU7lhdLcDPJuriHOHrpgQ-MLijJzE4sq40GGx4MNNyasAgC9ni-Z
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2209.05732
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2209_05732
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2209_057323
IEDL.DBID GOX
IngestDate Fri Sep 20 18:52:46 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2209_057323
OpenAccessLink https://arxiv.org/abs/2209.05732
ParticipantIDs arxiv_primary_2209_05732
PublicationCentury 2000
PublicationDate 2022-09-13
PublicationDateYYYYMMDD 2022-09-13
PublicationDate_xml – month: 09
  year: 2022
  text: 2022-09-13
  day: 13
PublicationDecade 2020
PublicationYear 2022
Score 3.702938
SecondaryResourceType preprint
Snippet This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence,...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Learning
Title R\'{e}nyi Divergence Deep Mutual Learning
URI https://arxiv.org/abs/2209.05732
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQSTI3STE2M0zUtbRINNM1STFK1k1KMk8EZjzzlKRkQ0Nj0xTwAlk_M49QE68I0wgmBgXYXpjEoorMMsj5wEnF-kZGoOMkTc2NgYUss5ERaMmWu38EZHISfBQXVD1CHbCNCRZCqiTcBBn4oa07BUdIdAgxMKXmiTBoBsWoV6fW5lVmKriAVkGAj79UcElNLVDwLQVt31CAHnKaLsog7-Ya4uyhCzY8vgByEkQ8yN54sL3GYgwswP56qgSDQoqFmVlikkGiqVFymolZcmJiamJiUlqSRap5mmEasHskySCByxQp3FLSDFxGoJX3oNsLjGUYWEqKSlNlgfVhSZIcOFAAIcBkuw
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=R%5C%27%7Be%7Dnyi+Divergence+Deep+Mutual+Learning&rft.au=Huang%2C+Weipeng&rft.au=Tao%2C+Junjie&rft.au=Deng%2C+Changbo&rft.au=Fan%2C+Ming&rft.date=2022-09-13&rft_id=info:doi/10.48550%2Farxiv.2209.05732&rft.externalDocID=2209_05732