Weakly Supervised Semantic Segmentation of Remote Sensing Images Based on Progressive Mining and Saliency-Enhanced Self-Attention

Given the high demands of effort in generating pixel-level annotations, weakly supervised semantic segmentation (WSSS) has become an important approach for remote sensing image (RSI) interpretation. However, current methods are mostly borrowed from natural scene studies, regardless of the significan...

Full description

Saved in:
Bibliographic Details
Published inIEEE geoscience and remote sensing letters Vol. 21; pp. 1 - 5
Main Authors Hao, Ting, Bai, Shuya, Wu, Tianyu, Zhang, Libao
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Given the high demands of effort in generating pixel-level annotations, weakly supervised semantic segmentation (WSSS) has become an important approach for remote sensing image (RSI) interpretation. However, current methods are mostly borrowed from natural scene studies, regardless of the significant variation in object sizes as well as the highly confusing intraclass heterogeneity and interclass homogeneity which are characteristic of RSIs. In this letter, we propose a WSSS method based on progressive mining and saliency-enhanced self-attention (PMSA), to efficiently segment RSIs with image-level labels. First, we exploit multiscale orientation patterns to sufficiently extract the rich texture in RSIs which can help to discern between the different classes, and combine this information with contrast and luminance features to generate fine saliency maps. Second, we design a progressive mining process to gradually discover both the large objects, representative of semantics, and the small objects, rich in patterns. Finally, we employ self-attention mechanism to capture global dependencies in RSIs for refining category areas. To inhibit the misspread of attention, we use saliency as a mask discerning between the background and the object classes. Experiments on different datasets demonstrate the competence of the proposed method, in terms of both metrical results and visual effects.
ISSN:1545-598X
1558-0571
DOI:10.1109/LGRS.2024.3355957