FRIC: a framework for few-shot remote sensing image captioning
ABSTRACTThe training of image captioning (IC) models requires a large number of caption-labeled samples, which is usually difficult to satisfy in the actual remote sensing scenarios. The performance of the models will be damaged due to the few-shot problems. We describe the few-shot problems in remo...
Saved in:
Published in | International journal of digital earth Vol. 17; no. 1 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Taylor & Francis Group
31.12.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | ABSTRACTThe training of image captioning (IC) models requires a large number of caption-labeled samples, which is usually difficult to satisfy in the actual remote sensing scenarios. The performance of the models will be damaged due to the few-shot problems. We describe the few-shot problems in remote sensing image captioning (RC) and design two research schemes. Then, we propose a few-shot RC framework few-shot remote sensing image captioning framework (FRIC). FRIC does not need additional samples and uses a simple base model. FRIC tries to get performance boosts from split samples and reduce the negative effects of noises. Unlike previous works that use 100% samples to simulate few-shot scenarios, FRIC uses less than 1.0% data to simulate actual few-shot scenarios. While previous works focus on improving the encoder, FRIC focuses on optimizing the decoder with parameter ensemble, multi-model ensemble and self-distillation. FRIC can train a simple base model with limited caption-labeled samples to generate captions that meet human expectations. FRIC shows obvious advantages to other methods when trained with only 0.8% samples of RC datasets. No previous work has used such a small amount of data to train the RC model. In addition, the effectiveness of the components in FRIC is verified with ablation experiments. |
---|---|
ISSN: | 1753-8947 1753-8955 |
DOI: | 10.1080/17538947.2024.2337240 |