MoBoo: Memory-Boosted Vision Transformer for Class-Incremental Learning

Continual learning strives to acquire knowledge across sequential tasks without forgetting previously assimilated knowledge. Current state-of-the-art methodologies utilize dynamic architectural strategies to increase the network capacity for new tasks. However, these approaches often suffer from a r...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology p. 1
Main Authors Ni, Bolin, Nie, Xing, Zhang, Chenghao, Xu, Shixiong, Zhang, Xin, Meng, Gaofeng, Xiang, Shiming
Format Journal Article
LanguageEnglish
Published IEEE 20.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Continual learning strives to acquire knowledge across sequential tasks without forgetting previously assimilated knowledge. Current state-of-the-art methodologies utilize dynamic architectural strategies to increase the network capacity for new tasks. However, these approaches often suffer from a rapid growth in the number of parameters. While some methods introduce an additional network compression stage to address this, they tend to construct complex and hyperparameter-sensitive systems. In this work, we introduce a novel solution to this challenge by proposing Memory-Boosted transformer (MoBoo) , instead of conventional architecture expansion and compression. Specifically, we design a memory-augmented attention mechanism by establishing a memory bank where the "key" and "value" linear projections are stored. This memory integration prompts the model to leverage previously learned knowledge, thereby enhancing stability during training at a marginal cost. The memory bank is lightweight and can be easily managed with a straightforward queue. Moreover, to increase the model's plasticity, we design a memory-attentive aggregator , which leverages the cross-attention mechanism to adaptively summarize the image representation from the encoder output that has historical knowledge involved. Extensive experiments on challenging benchmarks demonstrate the effectiveness of our method. For example, on ImageNet-100 under 10 tasks, our method outperforms the current state-of-the-art methods by +3.74% in average accuracy and using fewer parameters.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2024.3417431