ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection

Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 145; p. 109913
Main Authors	Shen, Jifeng, Chen, Yifei, Liu, Yue, Zuo, Xin, Fan, Heng, Yang, Wankou
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.01.2024
Subjects	Cross-attention Iterative feature fusion Multispectral object detection Transformer Iterative feature fusion Transformer Cross-attention Multispectral object detection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired by the human process of reviewing knowledge, an iterative interaction mechanism is proposed to share parameters among block-wise multimodal transformers, reducing model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios. Code will be available at https://github.com/chanchanchan97/ICAFusion. •A novel dual cross-attention feature fusion method is proposed for multispectral object detection, which simultaneously aggregates complementary information from RGB and thermal images.•An iterative learning strategy is tailored for efficient multispectral feature fusion, which further improves the model performance without additional increase of learnable parameters.•The proposed feature fusion method is both generalizable and effective, which can be plugged into different backbones and equipped with different detection frameworks.•The proposed CFE/ICFE module can function with different input image modalities, which provide a feasible solution when one of the modality is missing or has pool quality.•The proposed method can achieve the state-of-the-arts results on KAIST, FLIR and VEDAI datasets, while also obtains very fast inference speed.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2023.109913