Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions
The paradigm of machine intelligence moves from purely supervised learning to a more practical scenario when many loosely related unlabeled data are available and labeled data is scarce. Most existing algorithms assume that the underlying task distribution is stationary. Here we consider a more real...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.09.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The paradigm of machine intelligence moves from purely supervised learning to
a more practical scenario when many loosely related unlabeled data are
available and labeled data is scarce. Most existing algorithms assume that the
underlying task distribution is stationary. Here we consider a more realistic
and challenging setting in that task distributions evolve over time. We name
this problem as Semi-supervised meta-learning with Evolving Task diStributions,
abbreviated as SETS. Two key challenges arise in this more realistic setting:
(i) how to use unlabeled data in the presence of a large amount of unlabeled
out-of-distribution (OOD) data; and (ii) how to prevent catastrophic forgetting
on previously learned task distributions due to the task distribution shift. We
propose an OOD Robust and knowleDge presErved semi-supeRvised meta-learning
approach (ORDER), to tackle these two major challenges. Specifically, our ORDER
introduces a novel mutual information regularization to robustify the model
with unlabeled OOD data and adopts an optimal transport regularization to
remember previously learned knowledge in feature space. In addition, we test
our method on a very challenging dataset: SETS on large-scale non-stationary
semi-supervised task distributions consisting of (at least) 72K tasks. With
extensive experiments, we demonstrate the proposed ORDER alleviates forgetting
on evolving task distributions and is more robust to OOD data than related
strong baselines. |
---|---|
DOI: | 10.48550/arxiv.2209.01501 |