ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection

Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 145; p. 109913
Main Authors	Shen, Jifeng, Chen, Yifei, Liu, Yue, Zuo, Xin, Fan, Heng, Yang, Wankou
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.01.2024
Subjects	Cross-attention Iterative feature fusion Multispectral object detection Transformer Iterative feature fusion Transformer Cross-attention Multispectral object detection
Online Access	Get full text

Cover

Loading…

Abstract	Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired by the human process of reviewing knowledge, an iterative interaction mechanism is proposed to share parameters among block-wise multimodal transformers, reducing model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios. Code will be available at https://github.com/chanchanchan97/ICAFusion. •A novel dual cross-attention feature fusion method is proposed for multispectral object detection, which simultaneously aggregates complementary information from RGB and thermal images.•An iterative learning strategy is tailored for efficient multispectral feature fusion, which further improves the model performance without additional increase of learnable parameters.•The proposed feature fusion method is both generalizable and effective, which can be plugged into different backbones and equipped with different detection frameworks.•The proposed CFE/ICFE module can function with different input image modalities, which provide a feasible solution when one of the modality is missing or has pool quality.•The proposed method can achieve the state-of-the-arts results on KAIST, FLIR and VEDAI datasets, while also obtains very fast inference speed.
AbstractList	Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired by the human process of reviewing knowledge, an iterative interaction mechanism is proposed to share parameters among block-wise multimodal transformers, reducing model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios. Code will be available at https://github.com/chanchanchan97/ICAFusion. •A novel dual cross-attention feature fusion method is proposed for multispectral object detection, which simultaneously aggregates complementary information from RGB and thermal images.•An iterative learning strategy is tailored for efficient multispectral feature fusion, which further improves the model performance without additional increase of learnable parameters.•The proposed feature fusion method is both generalizable and effective, which can be plugged into different backbones and equipped with different detection frameworks.•The proposed CFE/ICFE module can function with different input image modalities, which provide a feasible solution when one of the modality is missing or has pool quality.•The proposed method can achieve the state-of-the-arts results on KAIST, FLIR and VEDAI datasets, while also obtains very fast inference speed.
ArticleNumber	109913
Author	Shen, Jifeng Zuo, Xin Fan, Heng Chen, Yifei Yang, Wankou Liu, Yue
Author_xml	– sequence: 1 givenname: Jifeng surname: Shen fullname: Shen, Jifeng email: shenjifeng@ujs.edu.cn organization: School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China – sequence: 2 givenname: Yifei surname: Chen fullname: Chen, Yifei organization: School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China – sequence: 3 givenname: Yue surname: Liu fullname: Liu, Yue organization: School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China – sequence: 4 givenname: Xin surname: Zuo fullname: Zuo, Xin organization: School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, 212003, China – sequence: 5 givenname: Heng surname: Fan fullname: Fan, Heng organization: Department of Computer Science and Engineering, University of North Texas, Denton, TX 76207, USA – sequence: 6 givenname: Wankou surname: Yang fullname: Yang, Wankou organization: School of Automation, Southeast University, Nanjing, 210096, China
BookMark	eNqFkMFOwzAMhiM0JLbBG3DIC3QkTZPSHZDQxMakSVzgiKI0cadUXTMl6STenmzlxAFOtmx_9u9_hia96wGhe0oWlFDx0C6OKmq3X-QkZ6lUVZRdoSl9LFnGaZFP0JQQRjOWE3aDZiG0hNAyNaboc7t6Xg_Bun6JtxG8ivYEWHsXQqZihD6mFt4P1oDBDag4eMDNBcCN8_gwdNGGI-joVYdd3aYMG4gppJFbdN2oLsDdT5yjj_XL--o1271t0uFdphkRMauoKMDw2nBRqYoxyJmijFd1zepKEcMV5aIBURKiC2Hq3HAKqiRCaZ0IzuZoOe69CPfQSG2jOitIsmwnKZFno2QrR6Pk2Sg5GpXg4hd89Pag_Nd_2NOIQXrsZMHLoC30Goz16XtpnP17wTdHNIiN
CitedBy_id	crossref_primary_10_1016_j_patcog_2025_111425 crossref_primary_10_1364_OE_551868 crossref_primary_10_1007_s11760_024_03337_4 crossref_primary_10_3389_fpls_2025_1538051 crossref_primary_10_3788_AOS240664 crossref_primary_10_1109_TGRS_2024_3446814 crossref_primary_10_1016_j_patcog_2024_110854 crossref_primary_10_3390_s24206717 crossref_primary_10_1109_TCPMT_2024_3491163 crossref_primary_10_1016_j_neucom_2025_129913 crossref_primary_10_1109_TAI_2024_3436037 crossref_primary_10_1109_TCSVT_2024_3418965 crossref_primary_10_1016_j_patcog_2025_111383 crossref_primary_10_1016_j_neucom_2025_129595 crossref_primary_10_1016_j_knosys_2025_113056 crossref_primary_10_1109_TCSVT_2024_3454631 crossref_primary_10_1007_s11042_024_19405_3 crossref_primary_10_3788_IRLA20240253 crossref_primary_10_1109_JSTARS_2024_3504549 crossref_primary_10_3390_s25010103 crossref_primary_10_1038_s41598_025_85697_6 crossref_primary_10_1109_LRA_2025_3550707 crossref_primary_10_3390_rs16234451 crossref_primary_10_1016_j_patcog_2024_110509 crossref_primary_10_1049_itr2_12562 crossref_primary_10_1016_j_imavis_2024_105344 crossref_primary_10_1109_TMM_2024_3410113 crossref_primary_10_1109_JSEN_2024_3374388 crossref_primary_10_1109_TITS_2024_3412417 crossref_primary_10_1016_j_inffus_2025_102939 crossref_primary_10_1049_ipr2_13124 crossref_primary_10_1109_TGRS_2025_3530085 crossref_primary_10_3390_electronics13091770 crossref_primary_10_1016_j_neucom_2024_128957 crossref_primary_10_1049_ell2_70093 crossref_primary_10_1016_j_patcog_2024_111040 crossref_primary_10_1016_j_compeleceng_2025_110133 crossref_primary_10_1109_TGRS_2024_3490752 crossref_primary_10_3390_rs17061057 crossref_primary_10_3390_s24041168 crossref_primary_10_1109_JSEN_2024_3386709 crossref_primary_10_1016_j_compag_2025_109957 crossref_primary_10_1016_j_measurement_2025_117043 crossref_primary_10_1016_j_dsp_2025_104996 crossref_primary_10_1109_TGRS_2024_3452550 crossref_primary_10_1109_LGRS_2024_3440045 crossref_primary_10_1016_j_asoc_2024_111971 crossref_primary_10_1016_j_imavis_2025_105468 crossref_primary_10_1016_j_brainres_2025_149507 crossref_primary_10_1109_TGRS_2025_3526190
Cites_doi	10.1109/TIP.2018.2867198 10.1145/3418213 10.1109/CVPR52688.2022.00116 10.1016/j.patcog.2018.08.005 10.1016/j.inffus.2018.11.017 10.1109/CVPRW.2019.00135 10.1109/ICCV.2019.00972 10.1109/CVPR42600.2020.01095 10.1016/j.compeleceng.2022.108385 10.1109/CVPR.2018.00913 10.1016/j.inffus.2018.09.015 10.1109/LRA.2021.3099870 10.1109/CVPR.2018.00745 10.1109/CVPR.2019.00060 10.1109/CVPR52688.2022.00493 10.1016/j.patcog.2022.108786 10.1109/TCSVT.2021.3076466 10.1016/j.jvcir.2015.11.002 10.1109/CVPR42600.2020.01155 10.1109/CVPRW.2017.36 10.5244/C.30.73 10.1016/j.infrared.2021.103770 10.1109/CVPR.2016.90 10.1109/CVPR.2017.106 10.1007/978-3-030-01234-2_1 10.1109/CVPR.2015.7298706 10.1109/ICCV48922.2021.00350 10.1109/ICCV.2019.00523 10.1109/WACV48630.2021.00012 10.1609/aaai.v36i3.20187 10.1109/ICCV48922.2021.00468 10.1016/j.patcog.2022.109071 10.1109/TPAMI.2011.155 10.1109/TCDS.2020.3048883 10.1007/s11432-021-3493-7 10.1016/j.patcog.2022.108998
ContentType	Journal Article
Copyright	2023 Elsevier Ltd
Copyright_xml	– notice: 2023 Elsevier Ltd
DBID	AAYXX CITATION
DOI	10.1016/j.patcog.2023.109913
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1873-5142
ExternalDocumentID	10_1016_j_patcog_2023_109913 S0031320323006118
GroupedDBID	--K --M -D8 -DT -~X .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 29O 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABFRF ABHFT ABJNI ABMAC ABTAH ABXDB ABYKQ ACBEA ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADMXK ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FD6 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM KZ1 LG9 LMP LY1 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDP SDS SES SEW SPC SPCBC SST SSV SSZ T5K TN5 UNMZH VOH WUQ XJE XPP ZMT ZY4 ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AFXIZ AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP BNPGV CITATION SSH
ID	FETCH-LOGICAL-c306t-9164ed5bd569a933e23a1359bb3b9a0d5a156fe6700c46db2d51ea706acc69a53
IEDL.DBID	.~1
ISSN	0031-3203
IngestDate	Tue Jul 01 02:36:44 EDT 2025 Thu Apr 24 23:10:53 EDT 2025 Fri Feb 23 02:36:02 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Iterative feature fusion Transformer Cross-attention Multispectral object detection
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c306t-9164ed5bd569a933e23a1359bb3b9a0d5a156fe6700c46db2d51ea706acc69a53
ParticipantIDs	crossref_citationtrail_10_1016_j_patcog_2023_109913 crossref_primary_10_1016_j_patcog_2023_109913 elsevier_sciencedirect_doi_10_1016_j_patcog_2023_109913
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	January 2024 2024-01-00
PublicationDateYYYYMMDD	2024-01-01
PublicationDate_xml	– month: 01 year: 2024 text: January 2024
PublicationDecade	2020
PublicationTitle	Pattern recognition
PublicationYear	2024
Publisher	Elsevier Ltd
Publisher_xml	– name: Elsevier Ltd
References	C. Devaguptapu, N. Akolekar, M. M Sharma, V. N Balasubramanian, Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019. Fu, Gu, Ai, Li, Wang (b8) 2021; 116 Zuo, Wang, Liu, Shen, Wang (b22) 2022 X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, C.-L. Tai, Transfusion: Robust lidar-camera fusion for 3d object detection with transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1090–1099. FLIR ADA Team, [EB/OL] L. Zhang, X. Zhu, X. Chen, X. Yang, Z. Lei, Z. Liu, Weakly aligned cross-modal learning for multispectral pedestrian detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5127–5137. X. Wei, T. Zhang, Y. Li, Y. Zhang, F. Wu, Multi-modality cross attention network for image and sentence matching, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10941–10950. Li, Hou, Wang, Gao, Xu, Li (b32) 2021; 14 D. Konig, M. Adam, C. Jarvers, G. Layher, H. Neumann, M. Teutsch, Fully convolutional region proposal networks for multispectral person detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 49–56. Kieu, Bagdanov, Bertini (b50) 2021; 17 C. Li, D. Song, R. Tong, M. Tang, Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation, in: British Machine Vision Conference, BMVC, 2018. Bochkovskiy, Wang, Liao (b36) 2020 Li, Song, Tong, Tang (b19) 2019; 85 Qingyun, Zhaokui (b43) 2022 Guan, Cao, Yang, Cao, Yang (b47) 2019; 50 Cheng, Han, Zhou, Xu (b3) 2018; 28 H. Zhang, E. Fromont, S. Lefèvre, B. Avignon, Guided attentive feature fusion for multispectral pedestrian detection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 72–80. Z. Tian, C. Shen, H. Chen, T. He, Fcos: Fully convolutional one-stage object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9627–9636. Qingyun, Dapeng, Zhaokui (b10) 2021 Yu, Wang, Chen, Wei (b41) 2014 Bosquet, Cores, Seidenari, Brea, Mucientes, Bimbo (b1) 2023; 133 Raghu, Unterthiner, Kornblith, Zhang, Dosovitskiy (b44) 2021; 34 A. Botach, E. Zheltonozhskii, C. Baskin, End-to-end referring video object segmentation with multimodal transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4985–4995. Cheng, Lai, Gao, Han (b27) 2023; 66 Kim, Park, Ro (b21) 2021; 32 Liu, Hasan, Liao (b2) 2023; 135 Shen, Liu, Xing (b40) 2022 X. Li, W. Wang, X. Hu, J. Yang, Selective kernel networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 510–519. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125. Shen, Liu, Chen, Zuo, Li, Yang (b18) 2022; 103 J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141. Dollar, Wojek, Schiele, Perona (b42) 2011; 34 X. Xie, G. Cheng, J. Wang, X. Yao, J. Han, Oriented R-CNN for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3520–3529. Razakarivony, Jurie (b14) 2016; 34 N. Liu, N. Zhang, K. Wan, L. Shao, J. Han, Visual saliency transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4722–4732. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-Net: Efficient channel attention for deep convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11534–11542. Simonyan, Zisserman (b34) 2014 (Accessed 6 July 2021). S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768. Lu, Batra, Parikh, Lee (b9) 2019; 32 S. Hwang, J. Park, N. Kim, Y. Choi, I. So Kweon, Multispectral pedestrian detection: Benchmark dataset and baseline, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1037–1045. Zhou, Chen, Cao (b15) 2020 Kim, Kim, Kim, Kim, Choi (b48) 2021; 6 J. Liu, S. Zhang, S. Wang, D.N. Metaxas, Multispectral deep neural networks for pedestrian detection, in: 27th British Machine Vision Conference, BMVC 2016, 2016. S. Woo, J. Park, J.-Y. Lee, I.S. Kweon, Cbam: Convolutional block attention module, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 3–19. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, in: International Conference on Learning Representations, 2020. Zhang, Liu, Zhang, Yang, Qiao, Huang, Hussain (b6) 2019; 50 Zhang, Fromont, Lefevre, Avignon (b16) 2020 Venkataramanan, Ghodrati, Asano, Porikli, Habibian (b45) 2023 Zhang, Lei, Xie, Fang, Li, Du (b51) 2023; 61 Y. Xiao, M. Yang, C. Li, L. Liu, J. Tang, Attribute-based progressive fusion network for rgbt tracking, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 3, 2022, pp. 2831–2838. Shen (10.1016/j.patcog.2023.109913_b18) 2022; 103 Yu (10.1016/j.patcog.2023.109913_b41) 2014 10.1016/j.patcog.2023.109913_b12 10.1016/j.patcog.2023.109913_b13 10.1016/j.patcog.2023.109913_b11 10.1016/j.patcog.2023.109913_b49 Zhang (10.1016/j.patcog.2023.109913_b6) 2019; 50 Guan (10.1016/j.patcog.2023.109913_b47) 2019; 50 10.1016/j.patcog.2023.109913_b7 10.1016/j.patcog.2023.109913_b5 Zhou (10.1016/j.patcog.2023.109913_b15) 2020 Venkataramanan (10.1016/j.patcog.2023.109913_b45) 2023 10.1016/j.patcog.2023.109913_b4 10.1016/j.patcog.2023.109913_b20 Li (10.1016/j.patcog.2023.109913_b32) 2021; 14 10.1016/j.patcog.2023.109913_b23 10.1016/j.patcog.2023.109913_b24 10.1016/j.patcog.2023.109913_b17 Bosquet (10.1016/j.patcog.2023.109913_b1) 2023; 133 Li (10.1016/j.patcog.2023.109913_b19) 2019; 85 Kim (10.1016/j.patcog.2023.109913_b21) 2021; 32 Zuo (10.1016/j.patcog.2023.109913_b22) 2022 Dollar (10.1016/j.patcog.2023.109913_b42) 2011; 34 Cheng (10.1016/j.patcog.2023.109913_b27) 2023; 66 Shen (10.1016/j.patcog.2023.109913_b40) 2022 Kim (10.1016/j.patcog.2023.109913_b48) 2021; 6 Fu (10.1016/j.patcog.2023.109913_b8) 2021; 116 10.1016/j.patcog.2023.109913_b30 Zhang (10.1016/j.patcog.2023.109913_b51) 2023; 61 Razakarivony (10.1016/j.patcog.2023.109913_b14) 2016; 34 10.1016/j.patcog.2023.109913_b31 Qingyun (10.1016/j.patcog.2023.109913_b10) 2021 10.1016/j.patcog.2023.109913_b35 Liu (10.1016/j.patcog.2023.109913_b2) 2023; 135 10.1016/j.patcog.2023.109913_b33 Kieu (10.1016/j.patcog.2023.109913_b50) 2021; 17 10.1016/j.patcog.2023.109913_b28 10.1016/j.patcog.2023.109913_b25 10.1016/j.patcog.2023.109913_b26 Qingyun (10.1016/j.patcog.2023.109913_b43) 2022 10.1016/j.patcog.2023.109913_b29 Cheng (10.1016/j.patcog.2023.109913_b3) 2018; 28 Lu (10.1016/j.patcog.2023.109913_b9) 2019; 32 Raghu (10.1016/j.patcog.2023.109913_b44) 2021; 34 Simonyan (10.1016/j.patcog.2023.109913_b34) 2014 Bochkovskiy (10.1016/j.patcog.2023.109913_b36) 2020 10.1016/j.patcog.2023.109913_b46 10.1016/j.patcog.2023.109913_b38 10.1016/j.patcog.2023.109913_b39 10.1016/j.patcog.2023.109913_b37 Zhang (10.1016/j.patcog.2023.109913_b16) 2020
References_xml	– reference: A. Botach, E. Zheltonozhskii, C. Baskin, End-to-end referring video object segmentation with multimodal transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4985–4995. – reference: C. Devaguptapu, N. Akolekar, M. M Sharma, V. N Balasubramanian, Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019. – reference: . FLIR ADA Team, [EB/OL] – reference: Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-Net: Efficient channel attention for deep convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11534–11542. – volume: 32 year: 2019 ident: b9 article-title: Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks publication-title: Adv. Neural Inf. Process. Syst. – year: 2022 ident: b43 article-title: Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery publication-title: Pattern Recognit. – volume: 135 year: 2023 ident: b2 article-title: Center and scale prediction: Anchor-free approach for pedestrian and face detection publication-title: Pattern Recognit. – year: 2020 ident: b36 article-title: Yolov4: Optimal speed and accuracy of object detection – reference: N. Liu, N. Zhang, K. Wan, L. Shao, J. Han, Visual saliency transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4722–4732. – reference: S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768. – volume: 32 start-page: 1510 year: 2021 end-page: 1523 ident: b21 article-title: Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection publication-title: IEEE Trans. Circuits Syst. Video Technol. – volume: 50 start-page: 20 year: 2019 end-page: 29 ident: b6 article-title: Cross-modality interactive attention network for multispectral pedestrian detection publication-title: Inf. Fusion – volume: 17 start-page: 1 year: 2021 end-page: 19 ident: b50 article-title: Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images publication-title: ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) – volume: 61 start-page: 1 year: 2023 end-page: 15 ident: b51 article-title: SuperYOLO: Super resolution assisted object detection in multimodal remote sensing imagery publication-title: IEEE Trans. Geosci. Remote Sens. – volume: 103 year: 2022 ident: b18 article-title: Mask-guided explicit feature modulation for multispectral pedestrian detection publication-title: Comput. Electr. Eng. – reference: Y. Xiao, M. Yang, C. Li, L. Liu, J. Tang, Attribute-based progressive fusion network for rgbt tracking, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 3, 2022, pp. 2831–2838. – volume: 133 year: 2023 ident: b1 article-title: A full data augmentation pipeline for small object detection based on generative adversarial networks publication-title: Pattern Recognit. – start-page: 1 year: 2022 end-page: 18 ident: b22 article-title: LGADet: Light-weight anchor-free multispectral pedestrian detection with mixed local and global attention publication-title: Neural Process. Lett. – reference: D. Konig, M. Adam, C. Jarvers, G. Layher, H. Neumann, M. Teutsch, Fully convolutional region proposal networks for multispectral person detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 49–56. – reference: A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, in: International Conference on Learning Representations, 2020. – volume: 34 start-page: 187 year: 2016 end-page: 203 ident: b14 article-title: Vehicle detection in aerial imagery: A small target detection benchmark publication-title: J. Vis. Commun. Image Represent. – year: 2023 ident: b45 article-title: Skip-attention: Improving vision transformers by paying less attention – reference: J. Liu, S. Zhang, S. Wang, D.N. Metaxas, Multispectral deep neural networks for pedestrian detection, in: 27th British Machine Vision Conference, BMVC 2016, 2016. – reference: Z. Tian, C. Shen, H. Chen, T. He, Fcos: Fully convolutional one-stage object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9627–9636. – volume: 116 year: 2021 ident: b8 article-title: Adaptive spatial pixel-level feature fusion network for multispectral pedestrian detection publication-title: Infrared Phys. Technol. – reference: K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. – reference: X. Wei, T. Zhang, Y. Li, Y. Zhang, F. Wu, Multi-modality cross attention network for image and sentence matching, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10941–10950. – volume: 28 start-page: 265 year: 2018 end-page: 278 ident: b3 article-title: Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection publication-title: IEEE Trans. Image Process. – start-page: 276 year: 2020 end-page: 280 ident: b16 article-title: Multispectral fusion for object detection with cyclic fuse-and-refine blocks publication-title: 2020 IEEE International Conference on Image Processing – start-page: 364 year: 2014 end-page: 375 ident: b41 article-title: Mixed pooling for convolutional neural networks publication-title: Rough Sets and Knowledge Technology: 9th International Conference, RSKT 2014, Shanghai, China, October 24-26, 2014, Proceedings 9 – reference: . (Accessed 6 July 2021). – start-page: 787 year: 2020 end-page: 803 ident: b15 article-title: Improving multispectral pedestrian detection by addressing modality imbalance problems publication-title: European Conference on Computer Vision – start-page: 727 year: 2022 end-page: 744 ident: b40 article-title: Sliced recursive transformer publication-title: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV – reference: S. Hwang, J. Park, N. Kim, Y. Choi, I. So Kweon, Multispectral pedestrian detection: Benchmark dataset and baseline, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1037–1045. – volume: 85 start-page: 161 year: 2019 end-page: 171 ident: b19 article-title: Illumination-aware faster R-CNN for robust multispectral pedestrian detection publication-title: Pattern Recognit. – reference: L. Zhang, X. Zhu, X. Chen, X. Yang, Z. Lei, Z. Liu, Weakly aligned cross-modal learning for multispectral pedestrian detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5127–5137. – volume: 34 start-page: 743 year: 2011 end-page: 761 ident: b42 article-title: Pedestrian detection: An evaluation of the state of the art publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 34 start-page: 12116 year: 2021 end-page: 12128 ident: b44 article-title: Do vision transformers see like convolutional neural networks? publication-title: Adv. Neural Inf. Process. Syst. – reference: J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141. – year: 2021 ident: b10 article-title: Cross-modality fusion transformer for multispectral object detection – reference: X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, C.-L. Tai, Transfusion: Robust lidar-camera fusion for 3d object detection with transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1090–1099. – reference: H. Zhang, E. Fromont, S. Lefèvre, B. Avignon, Guided attentive feature fusion for multispectral pedestrian detection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 72–80. – reference: C. Li, D. Song, R. Tong, M. Tang, Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation, in: British Machine Vision Conference, BMVC, 2018. – reference: S. Woo, J. Park, J.-Y. Lee, I.S. Kweon, Cbam: Convolutional block attention module, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 3–19. – year: 2014 ident: b34 article-title: Very deep convolutional networks for large-scale image recognition – volume: 50 start-page: 148 year: 2019 end-page: 157 ident: b47 article-title: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection publication-title: Inf. Fusion – reference: X. Li, W. Wang, X. Hu, J. Yang, Selective kernel networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 510–519. – reference: X. Xie, G. Cheng, J. Wang, X. Yao, J. Han, Oriented R-CNN for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3520–3529. – volume: 66 year: 2023 ident: b27 article-title: Class attention network for image recognition publication-title: Sci. China Inf. Sci. – volume: 14 start-page: 246 year: 2021 end-page: 252 ident: b32 article-title: Trear: Transformer-based rgb-d egocentric action recognition publication-title: IEEE Trans. Cogn. Dev. Syst. – reference: T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125. – volume: 6 start-page: 7846 year: 2021 end-page: 7853 ident: b48 article-title: MLPD: Multi-label pedestrian detector in multispectral domain publication-title: IEEE Robot. Autom. Lett. – ident: 10.1016/j.patcog.2023.109913_b11 – volume: 28 start-page: 265 issue: 1 year: 2018 ident: 10.1016/j.patcog.2023.109913_b3 article-title: Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection publication-title: IEEE Trans. Image Process. doi: 10.1109/TIP.2018.2867198 – volume: 17 start-page: 1 issue: 1 year: 2021 ident: 10.1016/j.patcog.2023.109913_b50 article-title: Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images publication-title: ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) doi: 10.1145/3418213 – start-page: 1 year: 2022 ident: 10.1016/j.patcog.2023.109913_b22 article-title: LGADet: Light-weight anchor-free multispectral pedestrian detection with mixed local and global attention publication-title: Neural Process. Lett. – year: 2023 ident: 10.1016/j.patcog.2023.109913_b45 – ident: 10.1016/j.patcog.2023.109913_b29 doi: 10.1109/CVPR52688.2022.00116 – volume: 85 start-page: 161 year: 2019 ident: 10.1016/j.patcog.2023.109913_b19 article-title: Illumination-aware faster R-CNN for robust multispectral pedestrian detection publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2018.08.005 – volume: 50 start-page: 148 year: 2019 ident: 10.1016/j.patcog.2023.109913_b47 article-title: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection publication-title: Inf. Fusion doi: 10.1016/j.inffus.2018.11.017 – ident: 10.1016/j.patcog.2023.109913_b49 doi: 10.1109/CVPRW.2019.00135 – ident: 10.1016/j.patcog.2023.109913_b39 doi: 10.1109/ICCV.2019.00972 – ident: 10.1016/j.patcog.2023.109913_b28 doi: 10.1109/CVPR42600.2020.01095 – volume: 32 year: 2019 ident: 10.1016/j.patcog.2023.109913_b9 article-title: Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks publication-title: Adv. Neural Inf. Process. Syst. – volume: 103 year: 2022 ident: 10.1016/j.patcog.2023.109913_b18 article-title: Mask-guided explicit feature modulation for multispectral pedestrian detection publication-title: Comput. Electr. Eng. doi: 10.1016/j.compeleceng.2022.108385 – ident: 10.1016/j.patcog.2023.109913_b38 doi: 10.1109/CVPR.2018.00913 – start-page: 364 year: 2014 ident: 10.1016/j.patcog.2023.109913_b41 article-title: Mixed pooling for convolutional neural networks – volume: 50 start-page: 20 year: 2019 ident: 10.1016/j.patcog.2023.109913_b6 article-title: Cross-modality interactive attention network for multispectral pedestrian detection publication-title: Inf. Fusion doi: 10.1016/j.inffus.2018.09.015 – year: 2014 ident: 10.1016/j.patcog.2023.109913_b34 – volume: 6 start-page: 7846 issue: 4 year: 2021 ident: 10.1016/j.patcog.2023.109913_b48 article-title: MLPD: Multi-label pedestrian detector in multispectral domain publication-title: IEEE Robot. Autom. Lett. doi: 10.1109/LRA.2021.3099870 – volume: 34 start-page: 12116 year: 2021 ident: 10.1016/j.patcog.2023.109913_b44 article-title: Do vision transformers see like convolutional neural networks? publication-title: Adv. Neural Inf. Process. Syst. – ident: 10.1016/j.patcog.2023.109913_b23 doi: 10.1109/CVPR.2018.00745 – ident: 10.1016/j.patcog.2023.109913_b24 doi: 10.1109/CVPR.2019.00060 – ident: 10.1016/j.patcog.2023.109913_b30 doi: 10.1109/CVPR52688.2022.00493 – year: 2022 ident: 10.1016/j.patcog.2023.109913_b43 article-title: Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2022.108786 – year: 2021 ident: 10.1016/j.patcog.2023.109913_b10 – volume: 61 start-page: 1 year: 2023 ident: 10.1016/j.patcog.2023.109913_b51 article-title: SuperYOLO: Super resolution assisted object detection in multimodal remote sensing imagery publication-title: IEEE Trans. Geosci. Remote Sens. – volume: 32 start-page: 1510 issue: 3 year: 2021 ident: 10.1016/j.patcog.2023.109913_b21 article-title: Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection publication-title: IEEE Trans. Circuits Syst. Video Technol. doi: 10.1109/TCSVT.2021.3076466 – volume: 34 start-page: 187 year: 2016 ident: 10.1016/j.patcog.2023.109913_b14 article-title: Vehicle detection in aerial imagery: A small target detection benchmark publication-title: J. Vis. Commun. Image Represent. doi: 10.1016/j.jvcir.2015.11.002 – start-page: 787 year: 2020 ident: 10.1016/j.patcog.2023.109913_b15 article-title: Improving multispectral pedestrian detection by addressing modality imbalance problems – ident: 10.1016/j.patcog.2023.109913_b26 doi: 10.1109/CVPR42600.2020.01155 – ident: 10.1016/j.patcog.2023.109913_b46 doi: 10.1109/CVPRW.2017.36 – ident: 10.1016/j.patcog.2023.109913_b5 doi: 10.5244/C.30.73 – volume: 116 year: 2021 ident: 10.1016/j.patcog.2023.109913_b8 article-title: Adaptive spatial pixel-level feature fusion network for multispectral pedestrian detection publication-title: Infrared Phys. Technol. doi: 10.1016/j.infrared.2021.103770 – ident: 10.1016/j.patcog.2023.109913_b35 doi: 10.1109/CVPR.2016.90 – ident: 10.1016/j.patcog.2023.109913_b37 doi: 10.1109/CVPR.2017.106 – ident: 10.1016/j.patcog.2023.109913_b17 – ident: 10.1016/j.patcog.2023.109913_b25 doi: 10.1007/978-3-030-01234-2_1 – start-page: 727 year: 2022 ident: 10.1016/j.patcog.2023.109913_b40 article-title: Sliced recursive transformer – ident: 10.1016/j.patcog.2023.109913_b13 – ident: 10.1016/j.patcog.2023.109913_b12 doi: 10.1109/CVPR.2015.7298706 – ident: 10.1016/j.patcog.2023.109913_b4 doi: 10.1109/ICCV48922.2021.00350 – ident: 10.1016/j.patcog.2023.109913_b20 doi: 10.1109/ICCV.2019.00523 – ident: 10.1016/j.patcog.2023.109913_b7 doi: 10.1109/WACV48630.2021.00012 – ident: 10.1016/j.patcog.2023.109913_b33 doi: 10.1609/aaai.v36i3.20187 – ident: 10.1016/j.patcog.2023.109913_b31 doi: 10.1109/ICCV48922.2021.00468 – year: 2020 ident: 10.1016/j.patcog.2023.109913_b36 – volume: 135 year: 2023 ident: 10.1016/j.patcog.2023.109913_b2 article-title: Center and scale prediction: Anchor-free approach for pedestrian and face detection publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2022.109071 – volume: 34 start-page: 743 issue: 4 year: 2011 ident: 10.1016/j.patcog.2023.109913_b42 article-title: Pedestrian detection: An evaluation of the state of the art publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2011.155 – volume: 14 start-page: 246 issue: 1 year: 2021 ident: 10.1016/j.patcog.2023.109913_b32 article-title: Trear: Transformer-based rgb-d egocentric action recognition publication-title: IEEE Trans. Cogn. Dev. Syst. doi: 10.1109/TCDS.2020.3048883 – start-page: 276 year: 2020 ident: 10.1016/j.patcog.2023.109913_b16 article-title: Multispectral fusion for object detection with cyclic fuse-and-refine blocks – volume: 66 issue: 3 year: 2023 ident: 10.1016/j.patcog.2023.109913_b27 article-title: Class attention network for image recognition publication-title: Sci. China Inf. Sci. doi: 10.1007/s11432-021-3493-7 – volume: 133 issn: 0031-3203 year: 2023 ident: 10.1016/j.patcog.2023.109913_b1 article-title: A full data augmentation pipeline for small object detection based on generative adversarial networks publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2022.108998
SSID	ssj0017142
Score	2.6962433
Snippet	Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	109913
SubjectTerms	Cross-attention Iterative feature fusion Multispectral object detection Transformer
Title	ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection
URI	https://dx.doi.org/10.1016/j.patcog.2023.109913
Volume	145
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEF6KXrz4Fuuj7MHrtkl3N8l6K8XSKvZkoRcJ2UdKRdJS0qu_3ZlNUhREwWPCTjZMdmcnwzffR8hdwlFuO9HMahcwATuIJVrETItcC6Wdigw2OD9Po_FMPM7lvEWGTS8Mwirr2F_FdB-t6zu92pu99XKJPb5IOxhwSKLhUAqx4VeIGFd592MH80B974oxnIcMRzftcx7jtYZwt1p0UUIceZVUyH8-nr4cOaNjcljninRQvc4JabnilBw1Ogy03pZn5HUyHIy2WPa6pxPPkgwhjPoJGNJnekAjXWyX1lmaO0_lSXNvQCFnpR5U6FsuNzDdSmNphlpXepRWcU5mo4eX4ZjVsgnMQP5fQviKhLNSWxmpTHHu-jwLuVRac62ywMoM_tlyh_05RqCclJWhy-IgyowBC8kvyF6xKtwlodJYkwTWRpGAZ8ZChVmey75DeQ5pkrxNeOOt1NSc4iht8Z424LG3tPJxij5OKx-3CdtZrStOjT_Gx82HSL-tjRTC_q-WV_-2vCYHcCWqYssN2Ss3W3cL6UepO359dcj-YPI0nn4COTnbyA
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT8IwFG8QD3rx24ifPXgtbLTdVm-GSECBEyRcTLN-jGAMEDKu_u32dRvRxGjidetbl9f219fmvd8PofuEgtx2oohRNiDMrSCSKBYTxTLFhLIi0lDgPBxFvQl7nvJpDXWqWhhIqyyxv8B0j9blk1bpzdZqPocaX6AdDKgLot2mFCY7aJe55QsyBs2PbZ4HCHwXlOE0JNC8qp_zSV4rh3fLWRM0xIFYSYT05_3py57TPUIHZbCIH4v_OUY1uzhBh5UQAy7X5Sl67Xceuxu493rAfU-T7DAM-w4I8Gf6jEY828yNNTiznssTZ94Au6AV-6xCX3O5dt0tFdzNYGNzn6a1OEOT7tO40yOlbgLR7gCQO_yKmDVcGR6JVFBq2zQNKRdKUSXSwPDUHdoyCwU6moGelOGhTeMgSrV2Fpyeo_piubAXCHNtdBIYE0XMfTNmIkyzjLct6HNwnWQNRCtvSV2SioO2xbusssfeZOFjCT6WhY8biGytVgWpxh_t42og5LfJIR3u_2p5-W_LO7TXGw8HctAfvVyhffeGFTcv16ierzf2xsUiubr1c-0TXS_dVg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ICAFusion%3A+Iterative+cross-attention+guided+feature+fusion+for+multispectral+object+detection&rft.jtitle=Pattern+recognition&rft.au=Shen%2C+Jifeng&rft.au=Chen%2C+Yifei&rft.au=Liu%2C+Yue&rft.au=Zuo%2C+Xin&rft.date=2024-01-01&rft.pub=Elsevier+Ltd&rft.issn=0031-3203&rft.eissn=1873-5142&rft.volume=145&rft_id=info:doi/10.1016%2Fj.patcog.2023.109913&rft.externalDocID=S0031320323006118
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon