Multi-scale contrastive adaptor learning for segmenting anything in underperformed scenes

Foundational vision models, such as the Segment Anything Model (SAM), have achieved significant breakthroughs through extensive pre-training on large-scale visual datasets. Despite their general success, these models may fall short in specialized tasks with limited data, and fine-tuning such large-s...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 606; p. 128395
Main Authors Zhou, Ke, Qiu, Zhongwei, Fu, Dongmei
Format Journal Article
LanguageEnglish
Published Elsevier B.V 14.11.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Foundational vision models, such as the Segment Anything Model (SAM), have achieved significant breakthroughs through extensive pre-training on large-scale visual datasets. Despite their general success, these models may fall short in specialized tasks with limited data, and fine-tuning such large-scale models is often not feasible. Current strategies involve incorporating adaptors into the pre-trained SAM to facilitate downstream task performance with minimal model adjustment. However, these strategies can be hampered by suboptimal learning approaches for the adaptors. In this paper, we introduce a novel Multi-scale Contrastive Adaptor learning method named MCA-SAM, which enhances adaptor performance through a meticulously designed contrastive learning framework at both token and sample levels. Our Token-level Contrastive adaptor (TC-adaptor) focuses on refining local representations by improving the discriminability of patch tokens, while the Sample-level Contrastive adaptor (SC-adaptor) amplifies global understanding across different samples. Together, these adaptors synergistically enhance feature comparison within and across samples, bolstering the model’s representational strength and its ability to adapt to new tasks. Empirical results demonstrate that MCA-SAM sets new benchmarks, outperforming existing methods in three challenging domains: camouflage object detection, shadow segmentation, and polyp segmentation. Specifically, MCA-SAM exhibits substantial relative performance enhancements, achieving a 20.0% improvement in MAE on the COD10K dataset, a 6.0% improvement in MAE on the CAMO dataset, a 15.4% improvement in BER on the ISTD dataset, and a 7.9% improvement in mDice on the Kvasir-SEG dataset. •We introduce MCA-SAM, a novel representation learning framework that integrates adaptor and contrastive learning for segmenting anything model, effectively enhancing SAM’s transferability in underperformed scenarios.•We propose a multi-scale contrastive adaptor that incorporates token-level and sample-level adaptor contrastive learning, aimed at enhancing the perceptual acuity and discriminative ability between local image patches as well as among distinct samples.•Comprehensive experiments across four benchmarks in three challenging scenarios demonstrate that MCA-SAM surpasses prior methods by significant margins.
ISSN:0925-2312
DOI:10.1016/j.neucom.2024.128395