Decoupled Knowledge Distillation via Spatial Feature Blurring for Hyperspectral Image Classification

It is well known that distillation learning has the ability to enhance the performance of a light (student) model by transferring knowledge from a heavy (teacher) model, without incurring additional computational and storage costs. This article proposes an improved decoupled knowledge distillation (...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal of selected topics in applied earth observations and remote sensing Vol. 17; pp. 8938 - 8955
Main Authors Xie, Wen, Zhang, ZheZhe, Jiao, Licheng, Wang, Jin
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:It is well known that distillation learning has the ability to enhance the performance of a light (student) model by transferring knowledge from a heavy (teacher) model, without incurring additional computational and storage costs. This article proposes an improved decoupled knowledge distillation (DKD) strategy for hyperspectral image (HSI) classification. A spatial feature blurring (SFB) module is designed to improve the classification performance of the student network when using DKD strategy. The SFB module utilizes randomly initialized 2-D standard normal distribution tensors to blur the spatial features of HSI, which increases the complexity of the data. This aligns with the characteristics of DKD, which transfers more useful knowledge under the condition of sample complexity. To effectively transfer knowledge, this article proposes a robust teacher network named the dual-branch spatial transformer-spectral transformer (DBSTST) network. This network describes the spatial and spectral long-range dependencies of HSI, addressing the limitations of convolutional neural networks in capturing only local features due to their fixed receptive fields. More specifically, the DBSTST network adopts spatial transformer-spectral transformer, which is composed of a parallel spatial-spectral multihead self-attention (PS2MHSA) module, aiming to describe pixel-level spatial long-range dependencies and spectral correlations in HSI. Simultaneously, the introduction of spatial-spectral positional embedding into PS2MHSA enhances positional awareness. We demonstrated the effectiveness of our proposed method on four publicly available HSI datasets. The student network achieves classification performance improvement and surpasses some other networks. Moreover, when compared with state-of-the-art classification methods, the DBSTST network also exhibits significant improvements in classification performance.
ISSN:1939-1404
2151-1535
DOI:10.1109/JSTARS.2024.3383854