Distributionally Robust Memory Evolution With Generalized Divergence for Continual Learning

Continual learning (CL) aims to learn a non-stationary data distribution and not forget previous knowledge. The effectiveness of existing approaches that rely on memory replay can decrease over time as the model tends to overfit the stored examples. As a result, the model's ability to generaliz...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 45; no. 12; pp. 14337 - 14352
Main Authors	Wang, Zhenyi, Shen, Li, Duan, Tiehang, Suo, Qiuling, Fang, Le, Liu, Wei, Gao, Mingchen
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	<named-content xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" content-type="math" xlink:type="simple"> <named-content content-type="math" xlink:type="simple"> <inline-formula> <tex-math notation="LaTeX"> f</tex-math> </inline-formula> </named-content> <mml:math> <mml:mi>f</mml:mi> </mml:math> <inline-graphic xlink:href="wang-ieq3-3317874.gif" xlink:type="simple"/> </named-content>-divergence Adaptation models Buffers Constraints Continual learning Data models distributionally robust optimization Effectiveness Euclidean geometry Euclidean space Evolution Gradient flow Learning Memory management Optimization Robustness Task analysis Training Uncertainty Wasserstein gradient flow
Online Access	Get full text
ISSN	0162-8828 1939-3539 2160-9292 1939-3539
DOI	10.1109/TPAMI.2023.3317874

Cover

More Information
Summary:	Continual learning (CL) aims to learn a non-stationary data distribution and not forget previous knowledge. The effectiveness of existing approaches that rely on memory replay can decrease over time as the model tends to overfit the stored examples. As a result, the model's ability to generalize well is significantly constrained. Additionally, these methods often overlook the inherent uncertainty in the memory data distribution, which differs significantly from the distribution of all previous data examples. To overcome these issues, we propose a principled memory evolution framework that dynamically adjusts the memory data distribution. This evolution is achieved by employing distributionally robust optimization (DRO) to make the memory buffer increasingly difficult to memorize. We consider two types of constraints in DRO: <inline-formula><tex-math notation="LaTeX">f</tex-math> <mml:math><mml:mi>f</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq1-3317874.gif"/> </inline-formula>-divergence and Wasserstein ball constraints. For <inline-formula><tex-math notation="LaTeX">f</tex-math> <mml:math><mml:mi>f</mml:mi></mml:math><inline-graphic xlink:href="wang-ieq2-3317874.gif"/> </inline-formula>-divergence constraint, we derive a family of methods to evolve the memory buffer data in the continuous probability measure space with Wasserstein gradient flow (WGF). For Wasserstein ball constraint, we directly solve it in the euclidean space. Extensive experiments on existing benchmarks demonstrate the effectiveness of the proposed methods for alleviating forgetting. As a by-product of the proposed framework, our method is more robust to adversarial examples than compared CL methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2023.3317874