S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data

In the expansive domain of computer vision, a myr-iad of pretrained models are at our disposal. However, most of these models are designed for natural RGB images and prove inadequate for spectral remote sensing (RS) images. Spectral RS images have two main traits: (1) multiple bands capturing divers...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 27696 - 27705
Main Authors Li, Xuyang, Hong, Danfeng, Chanussot, Jocelyn
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the expansive domain of computer vision, a myr-iad of pretrained models are at our disposal. However, most of these models are designed for natural RGB images and prove inadequate for spectral remote sensing (RS) images. Spectral RS images have two main traits: (1) multiple bands capturing diverse feature information, (2) spatial alignment and consistent spectral sequencing within the spatial-spectral dimension. In this paper, we introduce Spatial-SpectralMAE (S2MAE), a specialized pretrained architecture for spectral RS imagery. S2MAE employs a 3D transformer for masked autoencoder modeling, inte-grating learnable spectral-spatial embeddings with a 90% masking ratio. The model efficiently captures local spec-tral consistency and spatial invariance using compact cube tokens, demonstrating versatility to diverse input characteristics. This adaptability facilitates progressive pretraining on extensive spectral datasets. The effectiveness of S2MAE is validated through continuous pretraining on two sizable datasets, totaling over a million training images. The pretrained model is subsequently applied to three dis-tinct downstream tasks, with in-depth ablation studies conducted to emphasize its efficacy.
ISSN:2575-7075
DOI:10.1109/CVPR52733.2024.02616