SparseGrow: Addressing Growth-Induced Forgetting in Task-Agnostic Continual Learning
In continual learning (CL), model growth enhances adaptability over new data, improving knowledge retention for more tasks. However, improper model growth can lead to severe degradation of previously learned knowledge, an issue we name as growth-induced forgetting (GIFt), especially in task-agnostic...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
20.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In continual learning (CL), model growth enhances adaptability over new data,
improving knowledge retention for more tasks. However, improper model growth
can lead to severe degradation of previously learned knowledge, an issue we
name as growth-induced forgetting (GIFt), especially in task-agnostic CL using
entire grown model for inference. Existing works, despite adopting model growth
and random initialization for better adaptability, often fail to recognize the
presence of GIFt caused by improper model growth. This oversight limits
comprehensive control of forgetting and hinders full utilization of model
growth. We are the first in CL to identify this issue and conduct an in-depth
study on root cause of GIFt, where layer expansion stands out among model
growth strategies, widening layers without affecting model functionality. Yet,
direct adoption of layer expansion presents challenges. It lacks data-driven
control and initialization of expanded parameters to balance adaptability and
knowledge retention. This paper presents a novel SparseGrow approach to
overcome the issue of GIFt while enhancing adaptability over new data.
SparseGrow employs data-driven sparse layer expansion to control efficient
parameter usage during growth, reducing GIFt from excessive growth and
functionality changes. It also combines sparse growth with on-data
initialization at training late-stage to create partially 0-valued expansions
that fit learned distribution, enhancing retention and adaptability. To further
minimize forgetting, freezing is applied by calculating the sparse mask,
allowing data-driven preservation of important parameters. Through experiments
across datasets with various settings, cases and task numbers, we demonstrate
the necessity of layer expansion and showcase the effectiveness of SparseGrow
in overcoming GIFt, highlighting its adaptability and knowledge retention for
incremental tasks. |
---|---|
DOI: | 10.48550/arxiv.2408.10566 |