ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection
Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in...
Saved in:
Published in | Pattern recognition Vol. 145; p. 109913 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.01.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired by the human process of reviewing knowledge, an iterative interaction mechanism is proposed to share parameters among block-wise multimodal transformers, reducing model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios. Code will be available at https://github.com/chanchanchan97/ICAFusion.
•A novel dual cross-attention feature fusion method is proposed for multispectral object detection, which simultaneously aggregates complementary information from RGB and thermal images.•An iterative learning strategy is tailored for efficient multispectral feature fusion, which further improves the model performance without additional increase of learnable parameters.•The proposed feature fusion method is both generalizable and effective, which can be plugged into different backbones and equipped with different detection frameworks.•The proposed CFE/ICFE module can function with different input image modalities, which provide a feasible solution when one of the modality is missing or has pool quality.•The proposed method can achieve the state-of-the-arts results on KAIST, FLIR and VEDAI datasets, while also obtains very fast inference speed. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2023.109913 |