Efficient Test-Time Adaptation of Vision-Language Models
Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptati...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
27.03.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Test-time adaptation with pre-trained vision-language models has attracted
increasing attention for tackling distribution shifts during the test time.
Though prior studies have achieved very promising performance, they involve
intensive computation which is severely unaligned with test-time adaptation. We
design TDA, a training-free dynamic adapter that enables effective and
efficient test-time adaptation with vision-language models. TDA works with a
lightweight key-value cache that maintains a dynamic queue with few-shot pseudo
labels as values and the corresponding test-sample features as keys. Leveraging
the key-value cache, TDA allows adapting to test data gradually via progressive
pseudo label refinement which is super-efficient without incurring any
backpropagation. In addition, we introduce negative pseudo labeling that
alleviates the adverse impact of pseudo label noises by assigning pseudo labels
to certain negative classes when the model is uncertain about its pseudo label
predictions. Extensive experiments over two benchmarks demonstrate TDA's
superior effectiveness and efficiency as compared with the state-of-the-art.
The code has been released in \url{https://kdiaaa.github.io/tda/}. |
---|---|
DOI: | 10.48550/arxiv.2403.18293 |