A Hybrid Siamese Network With Spatiotemporal Enhancement and Two-Level Feature Fusion for Remote Sensing Image Change Detection

With the popularization and development of deep learning (DL) technology, remote sensing (RS) image change detection (CD) has achieved remarkable success. However, an accurate CD has still been challenging due to the difficulties in achieving efficient feature extraction and effective difference fea...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on geoscience and remote sensing Vol. 61; pp. 1 - 17
Main Authors Yan, Liangliang, Jiang, Jie
Format Journal Article
LanguageEnglish
Published New York IEEE 2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With the popularization and development of deep learning (DL) technology, remote sensing (RS) image change detection (CD) has achieved remarkable success. However, an accurate CD has still been challenging due to the difficulties in achieving efficient feature extraction and effective difference feature enhancement and refinement. To address these limitations, this article proposes a hybrid Siamese network with spatiotemporal enhancement and two-level feature fusion (named the HSSENet) for CD. First, an efficient hybrid Siamese backbone is designed by combining a transformer's advantage to capture dense dependencies between features and convolutional neural network (CNN)'s advantage to provide local prior knowledge. In addition, to reduce irrelevant pseudo-changes and high-frequency noise while maintaining the high compactness of changed targets, a spatiotemporal enhancement module (STEM) that adopts the self-attention mechanism for context modeling in spatiotemporal dimensions and can separately process low and high frequencies is proposed for effective difference feature enhancement. Finally, three two-level feature fusion modules (TL-FFMs) are designed instead of standard decoders to aggregate low-level details and high-level semantics for refining the boundary information. The proposed HSSENet is verified by experiments, and the experimental results demonstrate that it can obtain a better tradeoff between accuracy and efficiency than the state-of-the-art methods and significantly outperforms them with the F1-score of 91.48/91.55/91.17 points on the learning, vision, and RS (LEVIR)/Wuhan University (WHU)/deeply supervised image fusion network (DSIFN) test sets, respectively.
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2023.3268294