FRIC: a framework for few-shot remote sensing image captioning

ABSTRACTThe training of image captioning (IC) models requires a large number of caption-labeled samples, which is usually difficult to satisfy in the actual remote sensing scenarios. The performance of the models will be damaged due to the few-shot problems. We describe the few-shot problems in remo...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of digital earth Vol. 17; no. 1
Main Authors Zhou, Haonan, Xia, Lurui, Du, Xiaoping, Li, Sen
Format Journal Article
LanguageEnglish
Published Taylor & Francis Group 31.12.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:ABSTRACTThe training of image captioning (IC) models requires a large number of caption-labeled samples, which is usually difficult to satisfy in the actual remote sensing scenarios. The performance of the models will be damaged due to the few-shot problems. We describe the few-shot problems in remote sensing image captioning (RC) and design two research schemes. Then, we propose a few-shot RC framework few-shot remote sensing image captioning framework (FRIC). FRIC does not need additional samples and uses a simple base model. FRIC tries to get performance boosts from split samples and reduce the negative effects of noises. Unlike previous works that use 100% samples to simulate few-shot scenarios, FRIC uses less than 1.0% data to simulate actual few-shot scenarios. While previous works focus on improving the encoder, FRIC focuses on optimizing the decoder with parameter ensemble, multi-model ensemble and self-distillation. FRIC can train a simple base model with limited caption-labeled samples to generate captions that meet human expectations. FRIC shows obvious advantages to other methods when trained with only 0.8% samples of RC datasets. No previous work has used such a small amount of data to train the RC model. In addition, the effectiveness of the components in FRIC is verified with ablation experiments.
ISSN:1753-8947
1753-8955
DOI:10.1080/17538947.2024.2337240