Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion

Image fusion integrates a series of images acquired from different sensors, e.g. , infrared and visible, outputting an image with richer information than either one. Traditional and recent deep-based methods have difficulties in preserving prominent structures and recovering vital textural details f...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 32; no. 1; pp. 105 - 119
Main Authors	Liu, Jinyuan, Fan, Xin, Jiang, Ji, Liu, Risheng, Luo, Zhongxuan
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	attention mechanism Computer vision Datasets deep feature learning Deep learning Dictionaries Feature extraction Image acquisition Image edge detection Image fusion Image processing Infrared detectors Infrared imagery Inspection Learning Modules Night vision Task analysis Training Trans-Neptunian objects
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Image fusion integrates a series of images acquired from different sensors, e.g. , infrared and visible, outputting an image with richer information than either one. Traditional and recent deep-based methods have difficulties in preserving prominent structures and recovering vital textural details for practical applications. In this article, we propose a deep network for infrared and visible image fusion cascading a feature learning module with a fusion learning mechanism. Firstly, we apply a coarse-to-fine deep architecture to learn multi-scale features for multi-modal images, which enables discovering prominent common structures for later fusion operations. The proposed feature learning module requires no well-aligned image pairs for training. Compared with the existing learning-based methods, the proposed feature learning module can ensemble numerous examples from respective modals for training, increasing the ability of feature representation. Secondly, we design an edge-guided attention mechanism upon the multi-scale features to guide the fusion focusing on common structures, thus recovering details while attenuating noise. Moreover, we provide a new aligned infrared and visible image fusion dataset, RealStreet, collected in various practical scenarios for comprehensive evaluation. Extensive experiments on two benchmarks, TNO and RealStreet, demonstrate the superiority of the proposed method over the state-of-the-art in terms of both visual inspection and objective analysis on six evaluation metrics. We also conduct the experiments on the FLIR and NIR datasets, containing foggy weather and poor light conditions, to verify the generalization and robustness of the proposed method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2021.3056725