R\'{e}nyi Divergence Deep Mutual Learning

This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla D...

Full description

Saved in:

Bibliographic Details
Main Authors	Huang, Weipeng, Tao, Junjie, Deng, Changbo, Fan, Ming, Wan, Wenqiang, Xiong, Qi, Piao, Guangyuan
Format	Journal Article
Language	English
Published	13.09.2022
Subjects	Computer Science - Artificial Intelligence Computer Science - Learning
Online Access	Get full text

Cover

Loading…

Abstract	This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and R\'{e}nyi divergence, leading to further improvement in model generalization.
AbstractList	This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and R\'{e}nyi divergence, leading to further improvement in model generalization.
Author	Piao, Guangyuan Deng, Changbo Huang, Weipeng Tao, Junjie Wan, Wenqiang Xiong, Qi Fan, Ming
Author_xml	– sequence: 1 givenname: Weipeng surname: Huang fullname: Huang, Weipeng – sequence: 2 givenname: Junjie surname: Tao fullname: Tao, Junjie – sequence: 3 givenname: Changbo surname: Deng fullname: Deng, Changbo – sequence: 4 givenname: Ming surname: Fan fullname: Fan, Ming – sequence: 5 givenname: Wenqiang surname: Wan fullname: Wan, Wenqiang – sequence: 6 givenname: Qi surname: Xiong fullname: Xiong, Qi – sequence: 7 givenname: Guangyuan surname: Piao fullname: Piao, Guangyuan
BackLink	https://doi.org/10.48550/arXiv.2209.05732$$DView paper in arXiv
BookMark	eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIysNQzMDU3NuJk0AyKUa9Orc2rzFRwySxLLUpPzUtOVXBJTS1Q8C0tKU3MUfBJTSzKy8xL52FgTUvMKU7lhdLcDPJuriHOHrpgQ-MLijJzE4sq40GGx4MNNyasAgC9ni-Z
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2209.05732
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2209_05732
GroupedDBID	AKY GOX
ID	FETCH-arxiv_primary_2209_057323
IEDL.DBID	GOX
IngestDate	Fri Sep 20 18:52:46 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-arxiv_primary_2209_057323
OpenAccessLink	https://arxiv.org/abs/2209.05732
ParticipantIDs	arxiv_primary_2209_05732
PublicationCentury	2000
PublicationDate	2022-09-13
PublicationDateYYYYMMDD	2022-09-13
PublicationDate_xml	– month: 09 year: 2022 text: 2022-09-13 day: 13
PublicationDecade	2020
PublicationYear	2022
Score	3.702938
SecondaryResourceType	preprint
Snippet	This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence,...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence Computer Science - Learning
Title	R\'{e}nyi Divergence Deep Mutual Learning
URI	https://arxiv.org/abs/2209.05732
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQSTI3STE2M0zUtbRINNM1STFK1k1KMk8EZjzzlKRkQ0Nj0xTwAlk_M49QE68I0wgmBgXYXpjEoorMMsj5wEnF-kZGoOMkTc2NgYUss5ERaMmWu38EZHISfBQXVD1CHbCNCRZCqiTcBBn4oa07BUdIdAgxMKXmiTBoBsWoV6fW5lVmKriAVkGAj79UcElNLVDwLQVt31CAHnKaLsog7-Ya4uyhCzY8vgByEkQ8yN54sL3GYgwswP56qgSDQoqFmVlikkGiqVFymolZcmJiamJiUlqSRap5mmEasHskySCByxQp3FLSDFxGoJX3oNsLjGUYWEqKSlNlgfVhSZIcOFAAIcBkuw
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=R%5C%27%7Be%7Dnyi+Divergence+Deep+Mutual+Learning&rft.au=Huang%2C+Weipeng&rft.au=Tao%2C+Junjie&rft.au=Deng%2C+Changbo&rft.au=Fan%2C+Ming&rft.date=2022-09-13&rft_id=info:doi/10.48550%2Farxiv.2209.05732&rft.externalDocID=2209_05732