Deep Ensembles for Low-Data Transfer Learning
In the low-data regime, it is difficult to train good supervised models from scratch. Instead practitioners turn to pre-trained models, leveraging transfer learning. Ensembling is an empirically and theoretically appealing way to construct powerful predictive models, but the predominant approach of...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
14.10.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In the low-data regime, it is difficult to train good supervised models from
scratch. Instead practitioners turn to pre-trained models, leveraging transfer
learning. Ensembling is an empirically and theoretically appealing way to
construct powerful predictive models, but the predominant approach of training
multiple deep networks with different random initialisations collides with the
need for transfer via pre-trained weights. In this work, we study different
ways of creating ensembles from pre-trained models. We show that the nature of
pre-training itself is a performant source of diversity, and propose a
practical algorithm that efficiently identifies a subset of pre-trained models
for any downstream dataset. The approach is simple: Use nearest-neighbour
accuracy to rank pre-trained models, fine-tune the best ones with a small
hyperparameter sweep, and greedily construct an ensemble to minimise validation
cross-entropy. When evaluated together with strong baselines on 19 different
downstream tasks (the Visual Task Adaptation Benchmark), this achieves
state-of-the-art performance at a much lower inference budget, even when
selecting from over 2,000 pre-trained models. We also assess our ensembles on
ImageNet variants and show improved robustness to distribution shift. |
---|---|
DOI: | 10.48550/arxiv.2010.06866 |