Spectral-Spatial Feature Tokenization Transformer for Hyperspectral Image Classification

In hyperspectral image (HSI) classification, each pixel sample is assigned to a land-cover category. In the recent past, convolutional neural network (CNN)-based HSI classification methods have greatly improved performance due to their superior ability to represent features. However, these methods h...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on geoscience and remote sensing Vol. 60; pp. 1 - 14
Main Authors	Sun, Le, Zhao, Guangrui, Zheng, Yuhui, Wu, Zebin
Format	Journal Article
Language	English
Published	New York IEEE 2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Classification Coders Computation Computer applications Convolution Convolutional neural networks Convolutional neural networks (CNNs) Data mining Deep learning Feature extraction hyperspectral image (HSI) classification Hyperspectral imaging Image classification Land cover Machine learning Methods Modules Neural networks Principal component analysis semantic features Semantics Spectra spectral–spatial tokenization transformer Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In hyperspectral image (HSI) classification, each pixel sample is assigned to a land-cover category. In the recent past, convolutional neural network (CNN)-based HSI classification methods have greatly improved performance due to their superior ability to represent features. However, these methods have limited ability to obtain deep semantic features, and as the layer's number increases, computational costs rise significantly. The transformer framework can represent high-level semantic features well. In this article, a spectral-spatial feature tokenization transformer (SSFTT) method is proposed to capture spectral-spatial features and high-level semantic features. First, a spectral-spatial feature extraction module is built to extract low-level features. This module is composed of a 3-D convolution layer and a 2-D convolution layer, which are used to extract the shallow spectral and spatial features. Second, a Gaussian weighted feature tokenizer is introduced for features transformation. Third, the transformed features are input into the transformer encoder module for feature representation and learning. Finally, a linear layer is used to identify the first learnable token to obtain the sample label. Using three standard datasets, experimental analysis confirms that the computation time is less than other deep learning methods and the performance of the classification outperforms several current state-of-the-art methods. The code of this work is available at https://github.com/zgr6010/HSI_SSFTT for the sake of reproducibility.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2022.3144158