A Hybrid Siamese Network With Spatiotemporal Enhancement and Two-Level Feature Fusion for Remote Sensing Image Change Detection

With the popularization and development of deep learning (DL) technology, remote sensing (RS) image change detection (CD) has achieved remarkable success. However, an accurate CD has still been challenging due to the difficulties in achieving efficient feature extraction and effective difference fea...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on geoscience and remote sensing Vol. 61; pp. 1 - 17
Main Authors	Yan, Liangliang, Jiang, Jie
Format	Journal Article
Language	English
Published	New York IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Change detection Change detection (CD) Computational modeling Computer vision Context modeling Decoders Decoding Deep learning Detection Dimensions Feature extraction hybrid Siamese backbone Image enhancement Machine learning Modules Neural networks Remote sensing Semantics spatiotemporal enhancement module (STEM) Spatiotemporal phenomena Task analysis two-level feature fusion module (TL-FFM)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the popularization and development of deep learning (DL) technology, remote sensing (RS) image change detection (CD) has achieved remarkable success. However, an accurate CD has still been challenging due to the difficulties in achieving efficient feature extraction and effective difference feature enhancement and refinement. To address these limitations, this article proposes a hybrid Siamese network with spatiotemporal enhancement and two-level feature fusion (named the HSSENet) for CD. First, an efficient hybrid Siamese backbone is designed by combining a transformer's advantage to capture dense dependencies between features and convolutional neural network (CNN)'s advantage to provide local prior knowledge. In addition, to reduce irrelevant pseudo-changes and high-frequency noise while maintaining the high compactness of changed targets, a spatiotemporal enhancement module (STEM) that adopts the self-attention mechanism for context modeling in spatiotemporal dimensions and can separately process low and high frequencies is proposed for effective difference feature enhancement. Finally, three two-level feature fusion modules (TL-FFMs) are designed instead of standard decoders to aggregate low-level details and high-level semantics for refining the boundary information. The proposed HSSENet is verified by experiments, and the experimental results demonstrate that it can obtain a better tradeoff between accuracy and efficiency than the state-of-the-art methods and significantly outperforms them with the F1-score of 91.48/91.55/91.17 points on the learning, vision, and RS (LEVIR)/Wuhan University (WHU)/deeply supervised image fusion network (DSIFN) test sets, respectively.
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2023.3268294