DATFuse: Infrared and Visible Image Fusion via Dual Attention Transformer

The fusion of infrared and visible images aims to generate a composite image that can simultaneously contain the thermal radiation information of an infrared image and the plentiful texture details of a visible image to detect targets under various weather conditions with a high spatial resolution o...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 33; no. 7; pp. 3159 - 3172
Main Authors	Tang, Wei, He, Fazhi, Liu, Yu, Duan, Yansong, Si, Tongzhen
Format	Journal Article
Language	English
Published	New York IEEE 01.07.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	attention mechanism Computational modeling Computer vision Decoding Feature extraction Image fusion infrared image Infrared imagery Modules residual learning Source code Spatial resolution Target detection Task analysis Thermal radiation transformer Transformers Transmission line measurements Weather
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The fusion of infrared and visible images aims to generate a composite image that can simultaneously contain the thermal radiation information of an infrared image and the plentiful texture details of a visible image to detect targets under various weather conditions with a high spatial resolution of scenes. Previous deep fusion models were generally based on convolutional operations, resulting in a limited ability to represent long-range context information. In this paper, we propose a novel end-to-end model for infrared and visible image fusion via a dual attention Transformer termed DATFuse. To accurately examine the significant areas of the source images, a dual attention residual module (DARM) is designed for important feature extraction. To further model long-range dependencies, a Transformer module (TRM) is devised for global complementary information preservation. Moreover, a loss function that consists of three terms, namely, pixel loss, gradient loss, and structural loss, is designed to train the proposed model in an unsupervised manner. This can avoid manually designing complicated activity-level measurement and fusion strategies in traditional image fusion methods. Extensive experiments on public datasets reveal that our DATFuse outperforms other representative state-of-the-art approaches in both qualitative and quantitative assessments. The proposed model is also extended to address other infrared and visible image fusion tasks without fine-tuning, and the promising results demonstrate that it has good generalization ability. The source code is available at https://github.com/tthinking/DATFuse .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2023.3234340