Make Segment Anything Model Perfect on Shadow Detection

Compared to models pretrained on ImageNet, the segment anything model (SAM) has been trained on a massive segmentation corpus, excelling in both generalization ability and boundary localization. However, these strengths are still insufficient to enhance shadow detection without additional training,...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on geoscience and remote sensing Vol. 61; pp. 1 - 13
Main Authors	Chen, Xiao-Diao, Wu, Wen, Yang, Wenya, Qin, Hongshuai, Wu, Xiantao, Mao, Xiaoyang
Format	Journal Article
Language	English
Published	New York IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Annotations Curriculum Curriculum learning Detection Feature extraction Image segmentation Labels Lighting Localization noisy label segment anything model (SAM) Segments shadow detection Shadows Task analysis Training Training data Tuning unsupervised learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Compared to models pretrained on ImageNet, the segment anything model (SAM) has been trained on a massive segmentation corpus, excelling in both generalization ability and boundary localization. However, these strengths are still insufficient to enhance shadow detection without additional training, and it raises the question: do we still need precise manual annotations to fine-tune SAM for high detection accuracy? This article proposes an annotation-free framework for deep unsupervised shadow detection (USD) by leveraging SAM's capabilities. The key lies in how to exploit the abilities acquired from a large-scale corpus and utilize them to improve downstream tasks. Instead of directly fine-tuning SAM, we propose a prompt-like tuning method to inject task-specific cues into SAM in a lightweight manner, namely, ShadowSAM. This adaptation manner can ensure a good fitting when training data are limited. Moreover, considering that the pseudo labels used in our framework are generated by traditional USD approaches and may contain severe label noises, we propose an illumination and texture-guided updating (ITU) strategy to selectively boost the quality of pseudo masks. To further improve the model's robustness, we design a mask diversity index (MDI) to establish easy-to-hard subsets for incremental curriculum learning. Extensive experiments on benchmark datasets (i.e., SBU, UCF, ISTD, and CUHK-Shadow) demonstrate that our unsupervised solution can achieve comparable performance to state-of-the-art (SOTA) fully supervised methods. Our code is available at this repository.
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2023.3332257