YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer

Infrared and visible image fusion is aims to generate a composite image that can simultaneously describe the salient target in the infrared image and texture details in the visible image of the same scene. Since deep learning (DL) exhibits great feature extraction ability in computer vision tasks, i...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 25; pp. 5413 - 5428
Main Authors	Tang, Wei, He, Fazhi, Liu, Yu
Format	Journal Article
Language	English
Published	Piscataway IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Computer vision Dynamic transformer Feature extraction Image fusion Image segmentation infrared image Infrared imagery Roads Task analysis Thermal radiation Training Transformers Y-shape network
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Infrared and visible image fusion is aims to generate a composite image that can simultaneously describe the salient target in the infrared image and texture details in the visible image of the same scene. Since deep learning (DL) exhibits great feature extraction ability in computer vision tasks, it has also been widely employed in handling infrared and visible image fusion issue. However, the existing DL-based methods generally extract complementary information from source images through convolutional operations, which results in limited preservation of global features. To this end, we propose a novel infrared and visible image fusion method, i.e., the Y-shape dynamic Transformer (YDTR). Specifically, a dynamic Transformer module (DTRM) is designed to acquire not only the local features but also the significant context information. Furthermore, the proposed network is devised in a Y-shape to comprehensively maintain the thermal radiation information from the infrared image and scene details from the visible image. Considering the specific information provided by the source images, we design a loss function that consists of two terms to improve fusion quality: a structural similarity (SSIM) term and a spatial frequency (SF) term. Extensive experiments on mainstream datasets illustrate that the proposed method outperforms both classical and state-of-the-art approaches in both qualitative and quantitative assessments. We further extend the YDTR to address other infrared and RGB-visible images and multi-focus images without fine-tuning, and the satisfactory fusion results demonstrate that the proposed method has good generalization capability.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2022.3192661