SeMask-Mask2Former: A Semantic Segmentation Model for High Resolution Remote Sensing Images
With the development of remote sensing, semantic segmentation of high-resolution remote sensing images (RSIs) is increasingly essential. At the same time, the characteristics of objects in RSIs, such as large size, variation in object scales, and complex details, make it necessary to capture both lo...
Saved in:
Published in | 2023 IEEE Aerospace Conference pp. 1 - 6 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
04.03.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | With the development of remote sensing, semantic segmentation of high-resolution remote sensing images (RSIs) is increasingly essential. At the same time, the characteristics of objects in RSIs, such as large size, variation in object scales, and complex details, make it necessary to capture both long-range context and local information. There are some methods such as Fully Convolutional Networks (FCN) and Pyramid Scene Parsing Network (PSPNet) lack the ability to capture long-range dependencies, due to the limited receptive field of Convolutional Neural Network (CNN). However, the self-attention mechanism to capture the correlation between pixels in Transformer models has remarkable capability in capturing long-range context. One of the most outstanding Transformer models is the Masked-attention Mask Transformer (Mask2Former) which adopts the mask classification method. We propose a model SeMask-Mask2Former with boundary loss. Semantically Masked (Se-Mask) is the model's backbone and Mask2Former is the decoder. Concretely, the mask classification that generates one or even more masks for specific categories to perform the elaborate segmentation is especially suitable for handling the characteristic of large within-class and small inter-class variance of RSIs. Above all, extensive experimental results show that SeMask-Mask2Former obtains better results in semantic segmentation of high-resolution RSIs on the ISPRS Potsdam dataset compared to CNN-based methods and other state-of-the-art transformer-based methods. Extensive ablation studies conducted on the Potsdam dataset verifies the contribution of each component or optimization strategy in SeMask-Mask2Former. |
---|---|
DOI: | 10.1109/AERO55745.2023.10115761 |