S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
In the expansive domain of computer vision, a myr-iad of pretrained models are at our disposal. However, most of these models are designed for natural RGB images and prove inadequate for spectral remote sensing (RS) images. Spectral RS images have two main traits: (1) multiple bands capturing divers...
Saved in:
Published in | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 27696 - 27705 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
16.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In the expansive domain of computer vision, a myr-iad of pretrained models are at our disposal. However, most of these models are designed for natural RGB images and prove inadequate for spectral remote sensing (RS) images. Spectral RS images have two main traits: (1) multiple bands capturing diverse feature information, (2) spatial alignment and consistent spectral sequencing within the spatial-spectral dimension. In this paper, we introduce Spatial-SpectralMAE (S2MAE), a specialized pretrained architecture for spectral RS imagery. S2MAE employs a 3D transformer for masked autoencoder modeling, inte-grating learnable spectral-spatial embeddings with a 90% masking ratio. The model efficiently captures local spec-tral consistency and spatial invariance using compact cube tokens, demonstrating versatility to diverse input characteristics. This adaptability facilitates progressive pretraining on extensive spectral datasets. The effectiveness of S2MAE is validated through continuous pretraining on two sizable datasets, totaling over a million training images. The pretrained model is subsequently applied to three dis-tinct downstream tasks, with in-depth ablation studies conducted to emphasize its efficacy. |
---|---|
ISSN: | 2575-7075 |
DOI: | 10.1109/CVPR52733.2024.02616 |