MCAFNet: Multiscale cross-modality adaptive fusion network for multispectral object detection

Multispectral object detection techniques integrate data from various spectral modalities, such as combining thermal images with RGB visible light images, to enhance the precision a-nd robustness of object detection under diverse environmental c-onditions. Although this approach has improved detecti...

Full description

Saved in:
Bibliographic Details
Published inDigital signal processing Vol. 159; p. 104996
Main Authors Zheng, Shangpo, Junfeng, Liu, Zeng, Jun
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.04.2025
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Multispectral object detection techniques integrate data from various spectral modalities, such as combining thermal images with RGB visible light images, to enhance the precision a-nd robustness of object detection under diverse environmental c-onditions. Although this approach has improved detection capab-ilities, significant challenges remain in fully leveraging the specif-ic detail information of each single modality and accurately capt-uring cross-modality shared features information. To address th-ese challenges, we propose a Multiscale Cross-modality Adaptive Fusion Network (MCAFNet). This network incorporates Cross- modality interactive Transformer (CMIT) module, Multimodal Adaptive Weighted Fusion (MAWF) module, and a 3D-Integrated Attention Feature Enhancement (3D-IAFE) module. These components work together to comprehensively extract complementary feature between modalities and specific detailed feature within each modality, thereby enhancing the accuracy and robustness of multimodal object detection. Extensive experimental validation and in-depth ablation studies confirm the effectiveness of the proposed method, achieving state-of-the-art detection performance on multiple public datasets.
AbstractList Multispectral object detection techniques integrate data from various spectral modalities, such as combining thermal images with RGB visible light images, to enhance the precision a-nd robustness of object detection under diverse environmental c-onditions. Although this approach has improved detection capab-ilities, significant challenges remain in fully leveraging the specif-ic detail information of each single modality and accurately capt-uring cross-modality shared features information. To address th-ese challenges, we propose a Multiscale Cross-modality Adaptive Fusion Network (MCAFNet). This network incorporates Cross- modality interactive Transformer (CMIT) module, Multimodal Adaptive Weighted Fusion (MAWF) module, and a 3D-Integrated Attention Feature Enhancement (3D-IAFE) module. These components work together to comprehensively extract complementary feature between modalities and specific detailed feature within each modality, thereby enhancing the accuracy and robustness of multimodal object detection. Extensive experimental validation and in-depth ablation studies confirm the effectiveness of the proposed method, achieving state-of-the-art detection performance on multiple public datasets.
ArticleNumber 104996
Author Junfeng, Liu
Zeng, Jun
Zheng, Shangpo
Author_xml – sequence: 1
  givenname: Shangpo
  surname: Zheng
  fullname: Zheng, Shangpo
  organization: School of Automation Science and Engineering, South China University of Technology Science and Engineering, Guangzhou 510641, PR China
– sequence: 2
  givenname: Liu
  surname: Junfeng
  fullname: Junfeng, Liu
  organization: School of Automation Science and Engineering, South China University of Technology Science and Engineering, Guangzhou 510641, PR China
– sequence: 3
  givenname: Jun
  surname: Zeng
  fullname: Zeng, Jun
  email: junzeng@scut.edu.cn
  organization: School of Electric Power Engineering, South China University of Technology, Guangzhou 510641, PR China
BookMark eNp9kMFOAjEQhnvAREAfwFtfYLFd2tLqiRBRE9CLHk3T3Z0mXZftpi0Y3t4Cnj39M8l8kz_fBI163wNCd5TMKKHivp01cZiVpOR5Z0qJERpTwmlREsKu0STGlhCyYKUYo6_tarl-g_SAt_suuVibDnAdfIzFzjemc-mITWOG5A6A7T463-Me0o8P39j6gHdnaoA6BdNhX7V5wg2kHPn0Bl1Z00W4_csp-lw_faxeis378-tquSnqkqlUgLBWSkEAiCwrsKZazAmzRvCKWSqBcEkpk8oaKi0XDSiupFwIxWsowar5FNHL33PzAFYPwe1MOGpK9MmJbnV2ok9O9MVJZh4vDORiBwdBx9pBX0PjQm6vG-_-oX8BdBZvkQ
Cites_doi 10.1109/TCSVT.2023.3234340
10.1109/TCSVT.2016.2581660
10.1109/LSP.2023.3309578
10.1109/TCYB.2021.3095305
10.1109/TCSVT.2016.2539684
10.1109/TIM.2022.3216413
10.1016/j.inffus.2022.10.034
10.1109/TCSVT.2022.3180274
10.3390/s16060820
10.1007/s13369-021-06181-7
10.1109/TCSVT.2021.3060162
10.3390/rs13183656
10.1109/TCSVT.2023.3306870
10.1016/j.patcog.2023.109913
10.2139/ssrn.4227745
10.1016/j.jvcir.2015.11.002
10.1016/j.neucom.2022.04.015
10.1016/j.inffus.2018.09.015
10.1109/TCSVT.2021.3054584
10.1109/TPAMI.2016.2577031
10.3390/s21124184
10.1109/TCSVT.2015.2511812
10.1109/TCSVT.2021.3109895
10.1109/TCSVT.2022.3168279
10.1016/j.patcog.2022.108786
10.1109/TVT.2004.834875
10.1109/TCSVT.2021.3056725
ContentType Journal Article
Copyright 2025
Copyright_xml – notice: 2025
DBID AAYXX
CITATION
DOI 10.1016/j.dsp.2025.104996
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
ExternalDocumentID 10_1016_j_dsp_2025_104996
S1051200425000181
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
29G
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AATTM
AAXKI
AAXUO
AAYFN
AAYWO
ABBOA
ABDPE
ABFNM
ABJNI
ABMAC
ABWVN
ABXDB
ACDAQ
ACGFS
ACNNM
ACRLP
ACRPL
ACVFH
ACZNC
ADBBV
ADCNI
ADEZE
ADFGL
ADJOM
ADMUD
ADNMO
ADTZH
AEBSH
AECPX
AEIPS
AEKER
AENEX
AEUPX
AFJKZ
AFPUW
AFTJW
AFXIZ
AGCQF
AGHFR
AGQPQ
AGRNS
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGII
AIIUN
AIKHN
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
APXCP
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
BNPGV
CAG
COF
CS3
DM4
DU5
EBS
EFBJH
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG5
LG9
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SSH
SST
SSV
SSZ
T5K
WUQ
XPP
ZMT
ZU3
~G-
AAYXX
CITATION
ID FETCH-LOGICAL-c249t-e6ff8860ee082befab7304fa65b4f18e05811489fa18f56de959887695ce2ef93
IEDL.DBID .~1
ISSN 1051-2004
IngestDate Sun Jul 06 05:07:30 EDT 2025
Sat Jun 28 18:18:40 EDT 2025
IsPeerReviewed true
IsScholarly true
Keywords multimodal adaptive feature fusion
transformer
Attention mechanism
multispectral object detection
cross-modality
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c249t-e6ff8860ee082befab7304fa65b4f18e05811489fa18f56de959887695ce2ef93
ParticipantIDs crossref_primary_10_1016_j_dsp_2025_104996
elsevier_sciencedirect_doi_10_1016_j_dsp_2025_104996
PublicationCentury 2000
PublicationDate April 2025
2025-04-00
PublicationDateYYYYMMDD 2025-04-01
PublicationDate_xml – month: 04
  year: 2025
  text: April 2025
PublicationDecade 2020
PublicationTitle Digital signal processing
PublicationYear 2025
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Tang, He, Liu, Duan, Si (bib0030) Jul. 2023; 33
He, Zhang, Ren, Sun (bib0017) Jun. 2016
Zhang, Lei, Xie, Fang, Li, Du (bib0065) 2023; 61
Prakash, Chitta, Geiger (bib0043) Jun. 2021
S. Pei, J. Lin, W. Liu, T. Zhao, and C.-W. Lin, “Beyond night visibility: adaptive multi-scale fusion of infrared and visible images,” 2024. [Online]. Available
Wang, Girshick, Gupta, He (bib0049) Jun. 2018
Li, Zhang, Hu, Zhu, Fu, Chen (bib0064) April 2024; 34
Ren, He, Girshick, Sun (bib0053) 2015; 28
Fang, Yamada, Ninomiya (bib0032) Nov. 2004; 53
Redmon J, Farhadi A. YOLOv3: an incremental improvement [EB/OL]. (2018-05-25). [2024-05-20].
Zhang, Fromont, Lefevre, Avignon (bib0056) Sep. 2021
You, Xie, Feng, Mei, Ji (bib0023) Aug. 2023; 30
Zhang, Fromont, Lefèvre, Avignon (bib0010) 2021
G. J. et al., “ultralytics/yolov5: v5.0,” 2021. [Online]. Available
Yang, Liu, Huang, Wan, Wen, Guan (bib0011) Dec. 2021; 31
He, Zhang, Ren, Sun (bib0015) 2014
Ren, He, Girshick, Sun (bib0019) Jun. 2017; 39
Zheng (bib0051) Aug. 2022; 52
Qingyun, Zhaokui (bib0062) 2022; 130
Q. Fang, D. Han, and Z. Wang, “Cross-modality fusion transformer for multispectral object detection,” 2022. [Online]. Available
Zhang, Liu, Zhang, Yang, Qiao, Huang, Hussain (bib0036) Oct. 2019; 50
Wang, Chen, Shao, Li, Zhang (bib0025) Jul. 2022; 71
Teutsch, Muller, Huber (bib0004) Sep. 2014
Krizhevsky, Sutskever, Hinton (bib0016) Sep. 2012
Dosovitskiy, Kolesnikov, Weissenborn, Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly, Uszkoreit, Houlsby (bib0041) Jun. 2021
.
Jia, Zhu, Li, Tang, Zhou (bib0038) Oct. 2021
Jin, Guo, He, Xu, Wang, Su (bib0013) 2022; 491
Chen, Shi, Ye, Mertz, Ramanan, Kong (bib0067) Oct. 2022
Team (bib0039)
Liu, Fan, Jiang, Liu, Luo (bib0008) Jan. 2022; 32
Liu (bib0042) Oct. 2021
Sun, Cao, Zhu, Hu (bib0060) Oct. 2022; 32
Dhanaraj, Sharma, Sarkar, Karnam, Chachlakis, Ptucha, Markopoulos, Saber (bib0061) May. 2020
Zhang, Chen, Huang (bib0027) Apr. 2022
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (bib0040) Jun. 2017
V. Vibashan, J. Maria Jose Valanarasu, P. Oza, and V.M. Patel, “Imagefusion transformer,” 2021, arXiv:2107.09011.
Bao, Huang, Hu, Xiang (bib0028) 2022; 13534
González, Fang, Socarras (bib0033) Jun. 2016; 16
Jeong, Ko, Nam (bib0005) Jun. 2017; 27
Zhang, Liu, Zhang, Yang, Qiao, Huang, Hussain (bib0009) 2019; 50
Liu, Lam, Zhao, Qiu (bib0022) Jan. 2021; 32
Fuhr, Jung (bib0002) May 2017; 27
Menze, Geiger (bib0001) Jun. 2015
Wang, Wu, Zhu, Li, Zuo, Hu (bib0047) Jun. 2020
Yang, Zhang, Li, Xie (bib0048) Jul. 2021
Lee, Jovanov, Philips (bib0006) Jul. 2022
Li, Pan, Zhang, Wang, Yu (bib0070) April. 2024
Tang, Xiang, Zhang, Gong, Ma (bib0068) Mar. 2023; 91
Wang, Wang, Wu, Xu, Zhang (bib0007) Jun. 2022; 32
Woo, Park, Lee, So Kweon (bib0046) Oct. 2018
Radford, Hallacy, Ramesh, Goh, Agarwal, Sastry, Askell, Mishkin, Clark, Krueger, Sutskever (bib0044) Jul. 2021; 139
Pei, Lin, Liu, Zhao, Lin (bib0072) Mar. 2024
Zhou, Sun, Ren, Wang (bib0059) Sep. 2021; 13
Jiang, Cai, Yang (bib0055) Sep. 2022; 47
Jin, Yi, Xu (bib0014) Nov. 2022; 32
Zhao (bib0026) Apr. 2023
Wagner, Fischer, Herman, Behnke (bib0034) Apr., 2016
Zhang, Fromont, Lefevre, Avignon (bib0058) Jan. 2021
Cao, Yang, Zhao (bib0012) 2021; 21
Zhang, Fromont, Lefèvre, Avignon (bib0057) Oct. 2020
Razakarivony, Jurie (bib0037) Jan. 2016; 34
Liu, Zhang, Wang, Metaxas (bib0035) Sep. 2016
Hu, Shen, Sun (bib0045) Jun. 2018
Liu (bib0018) Dec. 2019
Bilal, Khan, Khan, Kyung (bib0003) Oct. 2017; 27
Zhang, Wang, Dayoub, Sunderhauf (bib0031) Jun. 2020
Selvaraju, Cogswell, Das, Vedantam, Parikh, Batra (bib0063) Oct. 2017
Cao, Bin, Hamari, Blasch, Liu (bib0069) Jun. 2023
Shen, Chen, Liu, Zuo, Fan, Yang (bib0071) 2024; 145
Redmon, Divvala, Girshick, Farhadi (bib0020) Jun. 2016
Zhou, Chen, Cao (bib0021) Dec. 2020
Teutsch (10.1016/j.dsp.2025.104996_bib0004) 2014
Jeong (10.1016/j.dsp.2025.104996_bib0005) 2017; 27
Ren (10.1016/j.dsp.2025.104996_bib0019) 2017; 39
You (10.1016/j.dsp.2025.104996_bib0023) 2023; 30
Zhang (10.1016/j.dsp.2025.104996_bib0057) 2020
Dhanaraj (10.1016/j.dsp.2025.104996_bib0061) 2020
Wagner (10.1016/j.dsp.2025.104996_bib0034) 2016
Liu (10.1016/j.dsp.2025.104996_bib0035) 2016
He (10.1016/j.dsp.2025.104996_bib0015) 2014
Ren (10.1016/j.dsp.2025.104996_bib0053) 2015; 28
Menze (10.1016/j.dsp.2025.104996_bib0001) 2015
Fang (10.1016/j.dsp.2025.104996_bib0032) 2004; 53
Tang (10.1016/j.dsp.2025.104996_bib0030) 2023; 33
Liu (10.1016/j.dsp.2025.104996_bib0042) 2021
Razakarivony (10.1016/j.dsp.2025.104996_bib0037) 2016; 34
10.1016/j.dsp.2025.104996_bib0050
10.1016/j.dsp.2025.104996_bib0052
Zhou (10.1016/j.dsp.2025.104996_bib0021) 2020
Liu (10.1016/j.dsp.2025.104996_bib0022) 2021; 32
Yang (10.1016/j.dsp.2025.104996_bib0011) 2021; 31
Vaswani (10.1016/j.dsp.2025.104996_bib0040) 2017
Zhang (10.1016/j.dsp.2025.104996_bib0010) 2021
Jiang (10.1016/j.dsp.2025.104996_bib0055) 2022; 47
Lee (10.1016/j.dsp.2025.104996_bib0006) 2022
Zhao (10.1016/j.dsp.2025.104996_bib0026) 2023
Yang (10.1016/j.dsp.2025.104996_bib0048) 2021
Tang (10.1016/j.dsp.2025.104996_bib0068) 2023; 91
Pei (10.1016/j.dsp.2025.104996_bib0072) 2024
Liu (10.1016/j.dsp.2025.104996_bib0018) 2019
Qingyun (10.1016/j.dsp.2025.104996_bib0062) 2022; 130
Zhang (10.1016/j.dsp.2025.104996_bib0031) 2020
Li (10.1016/j.dsp.2025.104996_bib0070) 2024
Cao (10.1016/j.dsp.2025.104996_bib0012) 2021; 21
Cao (10.1016/j.dsp.2025.104996_bib0069) 2023
Jin (10.1016/j.dsp.2025.104996_bib0014) 2022; 32
Hu (10.1016/j.dsp.2025.104996_bib0045) 2018
Krizhevsky (10.1016/j.dsp.2025.104996_bib0016) 2012
Fuhr (10.1016/j.dsp.2025.104996_bib0002) 2017; 27
Prakash (10.1016/j.dsp.2025.104996_bib0043) 2021
10.1016/j.dsp.2025.104996_bib0066
10.1016/j.dsp.2025.104996_bib0024
Wang (10.1016/j.dsp.2025.104996_bib0025) 2022; 71
Woo (10.1016/j.dsp.2025.104996_bib0046) 2018
Chen (10.1016/j.dsp.2025.104996_bib0067) 2022
Zhang (10.1016/j.dsp.2025.104996_bib0036) 2019; 50
Zhou (10.1016/j.dsp.2025.104996_bib0059) 2021; 13
Jin (10.1016/j.dsp.2025.104996_bib0013) 2022; 491
Zhang (10.1016/j.dsp.2025.104996_bib0065) 2023; 61
Zhang (10.1016/j.dsp.2025.104996_bib0009) 2019; 50
10.1016/j.dsp.2025.104996_bib0029
Wang (10.1016/j.dsp.2025.104996_bib0047) 2020
Sun (10.1016/j.dsp.2025.104996_bib0060) 2022; 32
Jia (10.1016/j.dsp.2025.104996_bib0038) 2021
Liu (10.1016/j.dsp.2025.104996_bib0008) 2022; 32
Bao (10.1016/j.dsp.2025.104996_bib0028) 2022; 13534
Redmon (10.1016/j.dsp.2025.104996_bib0020) 2016
González (10.1016/j.dsp.2025.104996_bib0033) 2016; 16
Zhang (10.1016/j.dsp.2025.104996_bib0058) 2021
Wang (10.1016/j.dsp.2025.104996_bib0049) 2018
Selvaraju (10.1016/j.dsp.2025.104996_bib0063) 2017
Bilal (10.1016/j.dsp.2025.104996_bib0003) 2017; 27
Dosovitskiy (10.1016/j.dsp.2025.104996_bib0041) 2021
Team (10.1016/j.dsp.2025.104996_bib0039)
Zheng (10.1016/j.dsp.2025.104996_bib0051) 2022; 52
Zhang (10.1016/j.dsp.2025.104996_bib0056) 2021
He (10.1016/j.dsp.2025.104996_bib0017) 2016
Zhang (10.1016/j.dsp.2025.104996_bib0027) 2022
Wang (10.1016/j.dsp.2025.104996_bib0007) 2022; 32
Radford (10.1016/j.dsp.2025.104996_bib0044) 2021; 139
Shen (10.1016/j.dsp.2025.104996_bib0071) 2024; 145
Li (10.1016/j.dsp.2025.104996_bib0064) 2024; 34
References_xml – volume: 27
  start-page: 1132
  year: May 2017
  end-page: 1142
  ident: bib0002
  article-title: Camera self-calibration based on nonlinear optimization and applications in surveillance systems
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– reference: Q. Fang, D. Han, and Z. Wang, “Cross-modality fusion transformer for multispectral object detection,” 2022. [Online]. Available:
– start-page: 7073
  year: Jun. 2021
  end-page: 7083
  ident: bib0043
  article-title: Multi-modal fusion transformer for end-to-end autonomous driving
  publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 449
  year: Sep. 2021
  end-page: 453
  ident: bib0056
  article-title: Deep active learningfrom multispectral data through cross-modality prediction inconsistency
– year: Apr., 2016
  ident: bib0034
  article-title: Multispectral pedestrian detection using deep fusion convolutional neural networks
  publication-title: . Neural Netw. (ESANN)
– volume: 34
  start-page: 187
  year: Jan. 2016
  end-page: 203
  ident: bib0037
  article-title: Vehicle detection in aerial imagery: a small target detection benchmark
  publication-title: Journal of Visual Communication and Image Representation
– start-page: 5906
  year: Apr. 2023
  end-page: 5916
  ident: bib0026
  article-title: CDDFuse: correlation-driven dual-branch feature decomposition for multi-modality image fusion
  publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
– volume: 27
  start-page: 2260
  year: Oct. 2017
  end-page: 2273
  ident: bib0003
  article-title: A low complexity pedestrian detection framework for smart video surveillance systems
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– volume: 32
  start-page: 105
  year: Jan. 2022
  end-page: 119
  ident: bib0008
  article-title: Learning a deep multiscale feature ensemble and an edge-attention guidance for image fusion
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– reference: Redmon J, Farhadi A. YOLOv3: an incremental improvement [EB/OL]. (2018-05-25). [2024-05-20].
– volume: 13
  start-page: 3656
  year: Sep. 2021
  ident: bib0059
  article-title: Visible-thermal image object detection via the combination of illumination conditions and temperature information
  publication-title: Remote Sens
– start-page: 9992
  year: Oct. 2021
  end-page: 10002
  ident: bib0042
  article-title: Swin transformer: hierarchical vision transformer usingshifted windows
  publication-title: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV)
– start-page: 770
  year: Jun. 2016
  end-page: 778
  ident: bib0017
  article-title: Deep residual learning for image recognition
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– year: 2014
  ident: bib0015
  article-title: Spatial pyramid pooling in deep convolutional networks for visual recognition
  publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV)
– ident: bib0039
  article-title: Free flir thermal dataset for algorithm training
– volume: 50
  start-page: 20
  year: 2019
  end-page: 29
  ident: bib0009
  article-title: Cross-modality interactive attention network for multispectral pedestrian detection
  publication-title: Inf. Fusion
– start-page: 72
  year: 2021
  end-page: 80
  ident: bib0010
  article-title: Guided attentive feature fusion for multispectral pedestrian detection
  publication-title: Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Jan. 3-8
– volume: 32
  start-page: 315
  year: Jan. 2021
  end-page: 329
  ident: bib0022
  article-title: Deep cross-modal representation learning and distillation for illumination-invariant pedestrian detection
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– reference: V. Vibashan, J. Maria Jose Valanarasu, P. Oza, and V.M. Patel, “Imagefusion transformer,” 2021, arXiv:2107.09011.
– start-page: 1
  year: Oct. 2020
  end-page: 5
  ident: bib0057
  article-title: Multispectral fusion for object detection with cyclic fuse-and-refine blocks
  publication-title: Proc. IEEE Int. Conf. Image Process. (ICIP)
– reference: G. J. et al., “ultralytics/yolov5: v5.0,” 2021. [Online]. Available:
– year: Jul. 2022
  ident: bib0006
  article-title: Cross-modality attention and multimodal fusion transformer for pedestrian detection
  publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV) Workshops
– start-page: 787
  year: Dec. 2020
  end-page: 803
  ident: bib0021
  article-title: Improving multispectral pedestrian detection by addressing modality imbalance problems
  publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV)
– start-page: 72
  year: Jan. 2021
  end-page: 80
  ident: bib0058
  article-title: Guided attentive feature fusion for multispectral pedestrian detection
  publication-title: Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV)
– year: Jun. 2021
  ident: bib0041
  article-title: An image is worth 16×16 words: transformers for image recognition at scale
  publication-title: Proc. Int. Conf. Learn. Represent. (ICLR)
– volume: 33
  start-page: 3159
  year: Jul. 2023
  end-page: 3172
  ident: bib0030
  article-title: DATFuse: Infrared and visible image fusion via dual attention transformer
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– start-page: 7132
  year: Jun. 2018
  end-page: 7141
  ident: bib0045
  article-title: Squeeze-and-excitation networks
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– year: Mar. 2024
  ident: bib0072
  article-title: Beyond night visibility: adaptive multi-scale fusion of infrared and visible images
  publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 618
  year: Oct. 2017
  end-page: 626
  ident: bib0063
  article-title: Grad-CAM: visual explanations from deep networks via gradient-based localization
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 21
  year: Dec. 2019
  end-page: 37
  ident: bib0018
  article-title: SSD: Single shot multibox detector
  publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV)
– volume: 91
  start-page: 477
  year: Mar. 2023
  end-page: 493
  ident: bib0068
  article-title: DivFusion: darkness-free infrared and visible image fusion
  publication-title: Inf. Fusion
– year: Sep. 2016
  ident: bib0035
  article-title: Multispectral deep neural networks for pedestrian detection
  publication-title: Proc. British Mach. Vis. Conf. (BMVC)
– volume: 27
  start-page: 1368
  year: Jun. 2017
  end-page: 1380
  ident: bib0005
  article-title: Early detection of sudden pedestrian crossing for safe driving during summer nights
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– volume: 28
  start-page: 91
  year: 2015
  end-page: 99
  ident: bib0053
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
  publication-title: Proc. Adv. Neural Inf. Process. Syst.
– year: April. 2024
  ident: bib0070
  article-title: MambaDFuse: a mamba-based dual-phase model for multi-modality image fusion
  publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
– volume: 32
  start-page: 3360
  year: Jun. 2022
  end-page: 3374
  ident: bib0007
  article-title: UNFusion: A unified multi-scale densely connected network for infrared and visible image fusion
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– volume: 30
  start-page: 1172
  year: Aug. 2023
  end-page: 1176
  ident: bib0023
  article-title: Multi-scale aggregation transformers for multispectral object detection
  publication-title: IEEE Signal Processing Letters
– start-page: 3061
  year: Jun. 2015
  end-page: 3070
  ident: bib0001
  article-title: Object scene flow for autonomous vehicles
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– volume: 32
  start-page: 6700
  year: Oct. 2022
  end-page: 6713
  ident: bib0060
  article-title: Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– volume: 31
  start-page: 4771
  year: Dec. 2021
  end-page: 4783
  ident: bib0011
  article-title: Infrared and visible image fusion via texture conditional generative adversarial network
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– start-page: 779
  year: Jun. 2016
  end-page: 788
  ident: bib0020
  article-title: You only look once: unified, real-time object detection
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– volume: 145
  year: 2024
  ident: bib0071
  article-title: ICAFusion: iterative cross-attention guided feature fusion for multispectral object detection
  publication-title: Pattern Recognition
– volume: 21
  start-page: 4184
  year: 2021
  ident: bib0012
  article-title: Attention fusion for one-stage multispectral pedestrian detection
  publication-title: Sensors
– start-page: 11534
  year: Jun. 2020
  end-page: 11542
  ident: bib0047
  article-title: ECANet: Efficient channel attention for deep convolutional neural networks
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 3489
  year: Oct. 2021
  end-page: 3497
  ident: bib0038
  article-title: LLVIP: A visible-infrared paired dataset for low-light vision
  publication-title: Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW)
– start-page: 11863
  year: Jul. 2021
  end-page: 11874
  ident: bib0048
  article-title: SimAM: a simple, parameter-free attention module for convolutional neural networks
  publication-title: Proc. 38th Int. Conf. Mach. Learn. (ICML)
– volume: 34
  start-page: 3017
  year: April 2024
  end-page: 3029
  ident: bib0064
  article-title: Stabilizing multispectral pedestrian detection with evidential hybrid fusion
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– start-page: 209
  year: Sep. 2014
  end-page: 216
  ident: bib0004
  article-title: Low resolution person detection with a moving thermal infrared camera by hot spot classification
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– reference: S. Pei, J. Lin, W. Liu, T. Zhao, and C.-W. Lin, “Beyond night visibility: adaptive multi-scale fusion of infrared and visible images,” 2024. [Online]. Available:
– volume: 47
  start-page: 2289
  year: Sep. 2022
  end-page: 2303
  ident: bib0055
  article-title: IARet: A lightweight multiscale infrared aerocraft recognition algorithm
  publication-title: Arab. J. Sci. Eng.
– volume: 16
  start-page: 820
  year: Jun. 2016
  ident: bib0033
  article-title: Pedestrian detection at day/night time with visible and FIR cameras: A comparison
  publication-title: Sensors
– year: May. 2020
  ident: bib0061
  article-title: Vehicle detection from multi-modal aerial imagery using YOLOv3 with mid-level fusion
  publication-title: SPIE Defense+Commercial Sensing
– volume: 53
  start-page: 1679
  year: Nov. 2004
  end-page: 1697
  ident: bib0032
  article-title: A shape-independent method for pedestrian detection with far-infrared images
  publication-title: IEEE Trans. Veh. Technol.
– start-page: 6000
  year: Jun. 2017
  end-page: 6010
  ident: bib0040
  article-title: Attention is all you need
  publication-title: Proc. 31st Int. Conf. Neural Inf. Process. Syst. (NIPS)
– volume: 139
  start-page: 8748
  year: Jul. 2021
  end-page: 8763
  ident: bib0044
  article-title: Learning transferable visual models from natural language supervision
  publication-title: Proceedings of the 38th International Conference on Machine Learning (ICML)
– start-page: 3
  year: Oct. 2018
  end-page: 19
  ident: bib0046
  article-title: CBAM: Convolutional block attention module
  publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV)
– volume: 61
  start-page: 1
  year: 2023
  end-page: 15
  ident: bib0065
  article-title: SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery
  publication-title: IEEE Transac. Geosci. Remote Sens.
– volume: 13534
  year: 2022
  ident: bib0028
  article-title: Attention-guided multi-modal and multi-scale fusion for multispectral pedestrian detection
  publication-title: Pattern Recognition and Computer Vision
– volume: 50
  start-page: 20
  year: Oct. 2019
  end-page: 29
  ident: bib0036
  article-title: Cross-modality interactive attention network for multispectral pedestrian detection
  publication-title: Inf. Fusion.
– start-page: 403
  year: Jun. 2023
  end-page: 411
  ident: bib0069
  article-title: Multimodal object detection by channel switching and spatial attention
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– volume: 52
  start-page: 8574
  year: Aug. 2022
  end-page: 8586
  ident: bib0051
  article-title: Enhancing geometric factors in model learning and inference for object detection and instance segmentation
  publication-title: IEEE
– volume: 39
  start-page: 1137
  year: Jun. 2017
  end-page: 1149
  ident: bib0019
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: .
– volume: 32
  start-page: 7632
  year: Nov. 2022
  end-page: 7645
  ident: bib0014
  article-title: MoADNet: mobile asymmetric dual-stream networks for real-time and lightweight RGB-D salient object detection
  publication-title: IEEE Transac. Circuit Syst. Video Technol.
– start-page: 8510
  year: Jun. 2020
  end-page: 8519
  ident: bib0031
  article-title: VarifocalNet: an IoU-aware dense object detector
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 898
  year: Apr. 2022
  end-page: 907
  ident: bib0027
  article-title: CAT-Det: Contrastively augmented transformer for multimodal 3D object detection
  publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
– volume: 491
  start-page: 414
  year: 2022
  end-page: 425
  ident: bib0013
  article-title: FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection
  publication-title: Neurocomputing
– volume: 130
  year: 2022
  ident: bib0062
  article-title: Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery
  publication-title: Pattern Recognit
– start-page: 1106
  year: Sep. 2012
  end-page: 1114
  ident: bib0016
  article-title: Imagenet classification with deep convolutional neural networks
  publication-title: Proc. Adv. Neural Inf. Process. Syst. (NeurIPS)
– start-page: 7794
  year: Jun. 2018
  end-page: 7803
  ident: bib0049
  article-title: Non-local neural networks
  publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
– year: Oct. 2022
  ident: bib0067
  article-title: Multimodal object detection via probabilistic ensembling
  publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV)
– volume: 71
  start-page: 1
  year: Jul. 2022
  end-page: 12
  ident: bib0025
  article-title: SwinFuse: a residual swin transformer fusion network for infrared and visible images
  publication-title: IEEE Trans. Instrum. Meas.
– volume: 33
  start-page: 3159
  issue: 7
  year: 2023
  ident: 10.1016/j.dsp.2025.104996_bib0030
  article-title: DATFuse: Infrared and visible image fusion via dual attention transformer
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2023.3234340
– start-page: 8510
  year: 2020
  ident: 10.1016/j.dsp.2025.104996_bib0031
  article-title: VarifocalNet: an IoU-aware dense object detector
– year: 2016
  ident: 10.1016/j.dsp.2025.104996_bib0034
  article-title: Multispectral pedestrian detection using deep fusion convolutional neural networks
– start-page: 3489
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0038
  article-title: LLVIP: A visible-infrared paired dataset for low-light vision
– volume: 27
  start-page: 2260
  issue: 10
  year: 2017
  ident: 10.1016/j.dsp.2025.104996_bib0003
  article-title: A low complexity pedestrian detection framework for smart video surveillance systems
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2016.2581660
– volume: 30
  start-page: 1172
  year: 2023
  ident: 10.1016/j.dsp.2025.104996_bib0023
  article-title: Multi-scale aggregation transformers for multispectral object detection
  publication-title: IEEE Signal Processing Letters
  doi: 10.1109/LSP.2023.3309578
– year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0041
  article-title: An image is worth 16×16 words: transformers for image recognition at scale
– year: 2024
  ident: 10.1016/j.dsp.2025.104996_bib0070
  article-title: MambaDFuse: a mamba-based dual-phase model for multi-modality image fusion
– volume: 52
  start-page: 8574
  issue: 8
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0051
  article-title: Enhancing geometric factors in model learning and inference for object detection and instance segmentation
  publication-title: IEEE Trans. Cybern.
  doi: 10.1109/TCYB.2021.3095305
– start-page: 7794
  year: 2018
  ident: 10.1016/j.dsp.2025.104996_bib0049
  article-title: Non-local neural networks
– volume: 27
  start-page: 1368
  issue: 6
  year: 2017
  ident: 10.1016/j.dsp.2025.104996_bib0005
  article-title: Early detection of sudden pedestrian crossing for safe driving during summer nights
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2016.2539684
– ident: 10.1016/j.dsp.2025.104996_bib0052
– volume: 28
  start-page: 91
  year: 2015
  ident: 10.1016/j.dsp.2025.104996_bib0053
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
– volume: 71
  start-page: 1
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0025
  article-title: SwinFuse: a residual swin transformer fusion network for infrared and visible images
  publication-title: IEEE Trans. Instrum. Meas.
  doi: 10.1109/TIM.2022.3216413
– volume: 91
  start-page: 477
  year: 2023
  ident: 10.1016/j.dsp.2025.104996_bib0068
  article-title: DivFusion: darkness-free infrared and visible image fusion
  publication-title: Inf. Fusion
  doi: 10.1016/j.inffus.2022.10.034
– volume: 32
  start-page: 7632
  issue: 11
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0014
  article-title: MoADNet: mobile asymmetric dual-stream networks for real-time and lightweight RGB-D salient object detection
  publication-title: IEEE Transac. Circuit Syst. Video Technol.
  doi: 10.1109/TCSVT.2022.3180274
– start-page: 403
  year: 2023
  ident: 10.1016/j.dsp.2025.104996_bib0069
  article-title: Multimodal object detection by channel switching and spatial attention
– volume: 139
  start-page: 8748
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0044
  article-title: Learning transferable visual models from natural language supervision
– volume: 16
  start-page: 820
  issue: 6
  year: 2016
  ident: 10.1016/j.dsp.2025.104996_bib0033
  article-title: Pedestrian detection at day/night time with visible and FIR cameras: A comparison
  publication-title: Sensors
  doi: 10.3390/s16060820
– volume: 47
  start-page: 2289
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0055
  article-title: IARet: A lightweight multiscale infrared aerocraft recognition algorithm
  publication-title: Arab. J. Sci. Eng.
  doi: 10.1007/s13369-021-06181-7
– start-page: 1106
  year: 2012
  ident: 10.1016/j.dsp.2025.104996_bib0016
  article-title: Imagenet classification with deep convolutional neural networks
– volume: 32
  start-page: 315
  issue: 1
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0022
  article-title: Deep cross-modal representation learning and distillation for illumination-invariant pedestrian detection
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2021.3060162
– start-page: 3061
  year: 2015
  ident: 10.1016/j.dsp.2025.104996_bib0001
  article-title: Object scene flow for autonomous vehicles
– start-page: 6000
  year: 2017
  ident: 10.1016/j.dsp.2025.104996_bib0040
  article-title: Attention is all you need
– volume: 13
  start-page: 3656
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0059
  article-title: Visible-thermal image object detection via the combination of illumination conditions and temperature information
  publication-title: Remote Sens
  doi: 10.3390/rs13183656
– start-page: 7073
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0043
  article-title: Multi-modal fusion transformer for end-to-end autonomous driving
– year: 2024
  ident: 10.1016/j.dsp.2025.104996_bib0072
  article-title: Beyond night visibility: adaptive multi-scale fusion of infrared and visible images
– volume: 34
  start-page: 3017
  issue: 4
  year: 2024
  ident: 10.1016/j.dsp.2025.104996_bib0064
  article-title: Stabilizing multispectral pedestrian detection with evidential hybrid fusion
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2023.3306870
– volume: 145
  year: 2024
  ident: 10.1016/j.dsp.2025.104996_bib0071
  article-title: ICAFusion: iterative cross-attention guided feature fusion for multispectral object detection
  publication-title: Pattern Recognition
  doi: 10.1016/j.patcog.2023.109913
– start-page: 5906
  year: 2023
  ident: 10.1016/j.dsp.2025.104996_bib0026
  article-title: CDDFuse: correlation-driven dual-branch feature decomposition for multi-modality image fusion
– start-page: 7132
  year: 2018
  ident: 10.1016/j.dsp.2025.104996_bib0045
  article-title: Squeeze-and-excitation networks
– volume: 61
  start-page: 1
  year: 2023
  ident: 10.1016/j.dsp.2025.104996_bib0065
  article-title: SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery
  publication-title: IEEE Transac. Geosci. Remote Sens.
– start-page: 779
  year: 2016
  ident: 10.1016/j.dsp.2025.104996_bib0020
  article-title: You only look once: unified, real-time object detection
– ident: 10.1016/j.dsp.2025.104996_bib0066
– ident: 10.1016/j.dsp.2025.104996_bib0024
  doi: 10.2139/ssrn.4227745
– year: 2014
  ident: 10.1016/j.dsp.2025.104996_bib0015
  article-title: Spatial pyramid pooling in deep convolutional networks for visual recognition
– volume: 34
  start-page: 187
  year: 2016
  ident: 10.1016/j.dsp.2025.104996_bib0037
  article-title: Vehicle detection in aerial imagery: a small target detection benchmark
  publication-title: Journal of Visual Communication and Image Representation
  doi: 10.1016/j.jvcir.2015.11.002
– volume: 491
  start-page: 414
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0013
  article-title: FCMNet: Frequency-aware cross-modality attention networks for RGB-D salient object detection
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2022.04.015
– start-page: 898
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0027
  article-title: CAT-Det: Contrastively augmented transformer for multimodal 3D object detection
– volume: 50
  start-page: 20
  year: 2019
  ident: 10.1016/j.dsp.2025.104996_bib0009
  article-title: Cross-modality interactive attention network for multispectral pedestrian detection
  publication-title: Inf. Fusion
  doi: 10.1016/j.inffus.2018.09.015
– ident: 10.1016/j.dsp.2025.104996_bib0050
– start-page: 72
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0010
  article-title: Guided attentive feature fusion for multispectral pedestrian detection
– volume: 31
  start-page: 4771
  issue: 12
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0011
  article-title: Infrared and visible image fusion via texture conditional generative adversarial network
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2021.3054584
– start-page: 11534
  year: 2020
  ident: 10.1016/j.dsp.2025.104996_bib0047
  article-title: ECANet: Efficient channel attention for deep convolutional neural networks
– start-page: 209
  year: 2014
  ident: 10.1016/j.dsp.2025.104996_bib0004
  article-title: Low resolution person detection with a moving thermal infrared camera by hot spot classification
– volume: 39
  start-page: 1137
  issue: 6
  year: 2017
  ident: 10.1016/j.dsp.2025.104996_bib0019
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2016.2577031
– start-page: 11863
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0048
  article-title: SimAM: a simple, parameter-free attention module for convolutional neural networks
– volume: 21
  start-page: 4184
  issue: 12
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0012
  article-title: Attention fusion for one-stage multispectral pedestrian detection
  publication-title: Sensors
  doi: 10.3390/s21124184
– year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0067
  article-title: Multimodal object detection via probabilistic ensembling
– volume: 27
  start-page: 1132
  issue: 5
  year: 2017
  ident: 10.1016/j.dsp.2025.104996_bib0002
  article-title: Camera self-calibration based on nonlinear optimization and applications in surveillance systems
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2015.2511812
– start-page: 787
  year: 2020
  ident: 10.1016/j.dsp.2025.104996_bib0021
  article-title: Improving multispectral pedestrian detection by addressing modality imbalance problems
– start-page: 618
  year: 2017
  ident: 10.1016/j.dsp.2025.104996_bib0063
  article-title: Grad-CAM: visual explanations from deep networks via gradient-based localization
– start-page: 770
  year: 2016
  ident: 10.1016/j.dsp.2025.104996_bib0017
  article-title: Deep residual learning for image recognition
– volume: 13534
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0028
  article-title: Attention-guided multi-modal and multi-scale fusion for multispectral pedestrian detection
– ident: 10.1016/j.dsp.2025.104996_bib0029
– start-page: 1
  year: 2020
  ident: 10.1016/j.dsp.2025.104996_bib0057
  article-title: Multispectral fusion for object detection with cyclic fuse-and-refine blocks
– volume: 32
  start-page: 3360
  issue: 6
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0007
  article-title: UNFusion: A unified multi-scale densely connected network for infrared and visible image fusion
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2021.3109895
– start-page: 9992
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0042
  article-title: Swin transformer: hierarchical vision transformer usingshifted windows
– start-page: 21
  year: 2019
  ident: 10.1016/j.dsp.2025.104996_bib0018
  article-title: SSD: Single shot multibox detector
– year: 2016
  ident: 10.1016/j.dsp.2025.104996_bib0035
  article-title: Multispectral deep neural networks for pedestrian detection
– start-page: 449
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0056
  article-title: Deep active learningfrom multispectral data through cross-modality prediction inconsistency
– start-page: 3
  year: 2018
  ident: 10.1016/j.dsp.2025.104996_bib0046
  article-title: CBAM: Convolutional block attention module
– year: 2020
  ident: 10.1016/j.dsp.2025.104996_bib0061
  article-title: Vehicle detection from multi-modal aerial imagery using YOLOv3 with mid-level fusion
– volume: 32
  start-page: 6700
  issue: 10
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0060
  article-title: Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2022.3168279
– ident: 10.1016/j.dsp.2025.104996_bib0039
– year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0006
  article-title: Cross-modality attention and multimodal fusion transformer for pedestrian detection
– volume: 50
  start-page: 20
  year: 2019
  ident: 10.1016/j.dsp.2025.104996_bib0036
  article-title: Cross-modality interactive attention network for multispectral pedestrian detection
  publication-title: Inf. Fusion.
  doi: 10.1016/j.inffus.2018.09.015
– volume: 130
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0062
  article-title: Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery
  publication-title: Pattern Recognit
  doi: 10.1016/j.patcog.2022.108786
– start-page: 72
  year: 2021
  ident: 10.1016/j.dsp.2025.104996_bib0058
  article-title: Guided attentive feature fusion for multispectral pedestrian detection
– volume: 53
  start-page: 1679
  issue: 6
  year: 2004
  ident: 10.1016/j.dsp.2025.104996_bib0032
  article-title: A shape-independent method for pedestrian detection with far-infrared images
  publication-title: IEEE Trans. Veh. Technol.
  doi: 10.1109/TVT.2004.834875
– volume: 32
  start-page: 105
  issue: 1
  year: 2022
  ident: 10.1016/j.dsp.2025.104996_bib0008
  article-title: Learning a deep multiscale feature ensemble and an edge-attention guidance for image fusion
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2021.3056725
SSID ssj0007426
Score 2.3903005
Snippet Multispectral object detection techniques integrate data from various spectral modalities, such as combining thermal images with RGB visible light images, to...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 104996
SubjectTerms Attention mechanism
cross-modality
multimodal adaptive feature fusion
multispectral object detection
transformer
Title MCAFNet: Multiscale cross-modality adaptive fusion network for multispectral object detection
URI https://dx.doi.org/10.1016/j.dsp.2025.104996
Volume 159
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NS8NAEB1KvehB_MTPsgdPQmyT7qZZb6VYqtJetNCLLJvsLFQwLW169bc7u0mwgl48JuSLl8nM2933JgA3FDLdNAotBS8mAc9CHUjLu4GQIaacosRY53ceT-LRlD_NxKwBg9oL42SVVe4vc7rP1tWedoVmezmft1-IGYSRDzpHVLz9mvOei_K7z2-ZBw39vMOIDvYRUa9seo2XWbuWlZFwK53S9e3_rTZt1ZvhAexXRJH1y2c5hAbmR7C31T7wGN7Gg_5wgsU98zbaNcGNzF86-FgYT7CZNnrpMhqzGzcvxvJS9s2IqzIvJvRWyxXdaZG6KRlmsPDqrPwEpsOH18EoqH6XEGQ0hioCjK1NkriDSGU9RatT-nq51bFIuQ0T7IjEDX6k1WFiRWxQCkkpJpYiwwit7J5CM1_keAasZ0KZZa6S6Yj4VKydoRYt7yWG4BbmHG5roNSy7IqharnYuyJUlUNVlaieA6-hVD9eraKs_fdpF_877RJ23VYprrmCZrHa4DXxhiJt-cBowU7_8Xk0-QIOR8H6
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JasMwEBUhObQ9lK40XXXoqWASO5Jj9RZCg9MslyaQSxGyNYIU6oTE-f-OZLuk0F569c7zeOZJem9MyCOGTCcJfIPBC5HHUl95wrCOx4UPCcMo0cb6nSfTMJ6z1wVf1Ei_8sJYWWWZ-4uc7rJ1uaVVotlaL5etN2QGfuCCzhIVa79u2O5UvE4aveEonn4nZBz9OZMRHu-ColrcdDIvvbVdKwNuFzuFbd3_W3naKzmDE3JcckXaKx7nlNQgOyNHex0Ez8n7pN8bTCF_ps5Ju0XEgbpLe58r7Tg2VVqtbVKjZmenxmhWKL8p0lXq9ITObbnBO60SOytDNeROoJVdkPngZdaPvfKPCV6Kw6jcg9CYKArbAFjZEzAqwQ-YGRXyhBk_gjaP7PhHGOVHhocaBBeYZULBUwjAiM4lqWerDK4I7WpfpKktZipAShUq66kFw7qRRsS5bpKnCii5LhpjyEox9iERVWlRlQWqTcIqKOWPtysxcf992vX_TnsgB_FsMpbj4XR0Qw7tnkJrc0vq-WYHd0gj8uS-DJMvysvEqw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MCAFNet%3A+Multiscale+cross-modality+adaptive+fusion+network+for+multispectral+object+detection&rft.jtitle=Digital+signal+processing&rft.au=Zheng%2C+Shangpo&rft.au=Junfeng%2C+Liu&rft.au=Zeng%2C+Jun&rft.date=2025-04-01&rft.pub=Elsevier+Inc&rft.issn=1051-2004&rft.volume=159&rft_id=info:doi/10.1016%2Fj.dsp.2025.104996&rft.externalDocID=S1051200425000181
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-2004&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-2004&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-2004&client=summon