SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation

We introduced SSR, which utilizes SAM (segment-anything) as a strong regularizer during training, to greatly enhance the robustness of the image encoder for handling various domains. Specifically, given the fact that SAM is pre-trained with a large number of images over the internet, which cover a d...

Full description

Saved in:

Bibliographic Details
Main Authors	Ge, Yanqi, Huang, Ye, Li, Wen, Duan, Lixin
Format	Journal Article
Language	English
Published	26.01.2024
Subjects	Computer Science - Computer Vision and Pattern Recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We introduced SSR, which utilizes SAM (segment-anything) as a strong regularizer during training, to greatly enhance the robustness of the image encoder for handling various domains. Specifically, given the fact that SAM is pre-trained with a large number of images over the internet, which cover a diverse variety of domains, the feature encoding extracted by the SAM is obviously less dependent on specific domains when compared to the traditional ImageNet pre-trained image encoder. Meanwhile, the ImageNet pre-trained image encoder is still a mature choice of backbone for the semantic segmentation task, especially when the SAM is category-irrelevant. As a result, our SSR provides a simple yet highly effective design. It uses the ImageNet pre-trained image encoder as the backbone, and the intermediate feature of each stage (ie there are 4 stages in MiT-B5) is regularized by SAM during training. After extensive experimentation on GTA5$\rightarrow$Cityscapes, our SSR significantly improved performance over the baseline without introducing any extra inference overhead.
DOI:	10.48550/arxiv.2401.14686