Adaptive Knowledge Distillation With Attention-Based Multi-Modal Fusion for Robust Dim Object Detection

Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not robust enough to precisely detect dim objects in aerial images due to the cluttered backgrounds, various observing angles, small object scal...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 27; pp. 2083 - 2096
Main Authors	Lan, Zhen, Li, Zixing, Yan, Chao, Xiang, Xiaojia, Tang, Dengqing, Zhou, Han, Lai, Jun
Format	Journal Article
Language	English
Published	IEEE 01.01.2025
Subjects	Attention mechanism Brain modeling Computer vision Detectors EEG Electroencephalography Emotion recognition Feature extraction knowledge distillation Object detection Search problems Training Visualization
Online Access	Get full text

Cover

Loading…

Abstract	Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not robust enough to precisely detect dim objects in aerial images due to the cluttered backgrounds, various observing angles, small object scales, and severe occlusions. Recently, electroencephalography (EEG)-based object detection methods have received increasing attention owing to the advanced cognitive capabilities of human vision. However, how to combine the human intelligence with computer intelligence to achieve robust dim object detection is still an open question. In this paper, we propose a novel approach to efficiently fuse and exploit the properties of multi-modal data for dim object detection. Specifically, we first design a brain-computer interface (BCI) paradigm called eye-tracking-based slow serial visual presentation (ESSVP) to simultaneously collect the paired EEG and image data when subjects search for the dim objects in aerial images. Then, we develop an attention-based multi-modal fusion network to selectively aggregate the learned features of EEG and image modalities. Furthermore, we propose an adaptive multi-teacher knowledge distillation method to efficiently train the multi-modal dim object detector for better performance. To evaluate the effectiveness of our method, we conduct extensive experiments on the collected dataset in subject-dependent and subject-independent tasks. The experimental results demonstrate that the proposed dim object detection method exhibits superior effectiveness and robustness compared to the baselines and the state-of-the-art methods.
AbstractList	Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not robust enough to precisely detect dim objects in aerial images due to the cluttered backgrounds, various observing angles, small object scales, and severe occlusions. Recently, electroencephalography (EEG)-based object detection methods have received increasing attention owing to the advanced cognitive capabilities of human vision. However, how to combine the human intelligence with computer intelligence to achieve robust dim object detection is still an open question. In this paper, we propose a novel approach to efficiently fuse and exploit the properties of multi-modal data for dim object detection. Specifically, we first design a brain-computer interface (BCI) paradigm called eye-tracking-based slow serial visual presentation (ESSVP) to simultaneously collect the paired EEG and image data when subjects search for the dim objects in aerial images. Then, we develop an attention-based multi-modal fusion network to selectively aggregate the learned features of EEG and image modalities. Furthermore, we propose an adaptive multi-teacher knowledge distillation method to efficiently train the multi-modal dim object detector for better performance. To evaluate the effectiveness of our method, we conduct extensive experiments on the collected dataset in subject-dependent and subject-independent tasks. The experimental results demonstrate that the proposed dim object detection method exhibits superior effectiveness and robustness compared to the baselines and the state-of-the-art methods.
Author	Zhou, Han Li, Zixing Lai, Jun Xiang, Xiaojia Lan, Zhen Tang, Dengqing Yan, Chao
Author_xml	– sequence: 1 givenname: Zhen orcidid: 0000-0001-6112-9279 surname: Lan fullname: Lan, Zhen email: lanzhen19@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 2 givenname: Zixing surname: Li fullname: Li, Zixing email: lizixing16@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 3 givenname: Chao orcidid: 0000-0002-9995-4239 surname: Yan fullname: Yan, Chao email: yanchao@nuaa.edu.cn organization: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China – sequence: 4 givenname: Xiaojia orcidid: 0000-0002-1525-6231 surname: Xiang fullname: Xiang, Xiaojia email: xiangxiaojia@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 5 givenname: Dengqing orcidid: 0000-0002-8781-9101 surname: Tang fullname: Tang, Dengqing email: tangdengqing09@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 6 givenname: Han surname: Zhou fullname: Zhou, Han email: zhouhan@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 7 givenname: Jun orcidid: 0000-0003-2342-487X surname: Lai fullname: Lai, Jun email: laijun@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China
BookMark	eNpNkM1OwzAQhC1UJNrCnQMHv4CLf-vkWAoFRKNKqIhj5MSb4ipNqtgB8fY4ag-cdlY7s9r9JmjUtA0gdMvojDGa3m-zbMYplzOhONOpuEBjlkpGKNV6FLXilKSc0Ss08X5PKZOK6jHaLaw5BvcN-K1pf2qwO8CPzgdX1ya4tsGfLnzhRQjQDC15MB4szvo6OJK11tR41fvBV7Udfm-L3oeYP-BNsYcySgixxPk1uqxM7eHmXKfoY_W0Xb6Q9eb5dblYkzJeHYgWhRC6kMpwAK1sBZVlpRR8npQqEZVITWITyfVcSCWMNPFFJdOKi2RuU87FFNHT3rJrve-gyo-dO5juN2c0H0DlEVQ-gMrPoGLk7hRxAPDPnjCpqRR_635mVQ
CODEN	ITMUF8
Cites_doi	10.1109/CVPR42600.2020.01009 10.1609/aaai.v32i1.11941 10.1109/LSP.2021.3095761 10.1109/ICASSP43922.2022.9747534 10.1109/BIBM52615.2021.9669556 10.1109/JOE.2015.2408471 10.1109/TPAMI.2021.3117983 10.1109/TAI.2024.3436538 10.1145/3484440 10.1109/TBME.2016.2583200 10.1109/TNSRE.2020.3023761 10.1109/ACCESS.2020.3036877 10.1088/1741-2552/aa9817 10.1088/1741-2560/8/3/036025 10.1109/TNSRE.2020.3009978 10.1109/CVPR.2018.00474 10.1109/TKDE.2008.239 10.1016/j.patcog.2022.108833 10.1109/ACCESS.2020.3033289 10.1109/TMM.2021.3074273 10.1007/s11432-018-9590-5 10.1007/978-3-030-86993-9_50 10.1145/3408317 10.1109/IHMSC.2017.44 10.1109/TMM.2019.2951463 10.1109/ICASSP40776.2020.9054698 10.1016/j.neunet.2023.01.009 10.3389/fncom.2016.00130 10.1109/JIOT.2020.2991025 10.1109/TCSII.2022.3208197 10.48550/ARXIV.1706.03762 10.1007/s11571-021-09751-5 10.1038/s41598-021-85235-0 10.1016/j.jneumeth.2003.10.009 10.1007/s11263-021-01453-z 10.1109/TNSRE.2021.3099908 10.1109/IWSSIP48289.2020.9145130 10.1609/aaai.v34i04.5963 10.1109/TBME.2014.2300164 10.1109/TITS.2022.3155488 10.1109/TMM.2020.2999183 10.1145/3097983.3098135 10.18653/v1/d16-1044 10.1109/cvpr52688.2022.01939 10.1109/TMM.2019.2934425 10.1109/ICASSP.2019.8682450 10.1109/TNSRE.2022.3184725 10.1016/j.jneumeth.2006.11.017 10.1109/CVPR42600.2020.01271
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TMM.2024.3521793
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) - NZ CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1941-0077
EndPage	2096
ExternalDocumentID	10_1109_TMM_2024_3521793 10814704
Genre	orig-research
GrantInformation_xml	– fundername: Natural Science Foundation of Jiangsu Province grantid: BK20241396 funderid: 10.13039/501100004608 – fundername: National Natural Science Foundation of China grantid: 62403240 funderid: 10.13039/501100001809
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION
ID	FETCH-LOGICAL-c217t-73b337b45a2ee75dfefd1c43268c583f39a8d842763453a4a941549f2386d9223
IEDL.DBID	RIE
ISSN	1520-9210
IngestDate	Tue Jul 01 05:13:23 EDT 2025 Wed Aug 27 02:04:15 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c217t-73b337b45a2ee75dfefd1c43268c583f39a8d842763453a4a941549f2386d9223
ORCID	0000-0002-9995-4239 0000-0001-6112-9279 0000-0002-1525-6231 0000-0002-8781-9101 0000-0003-2342-487X
PageCount	14
ParticipantIDs	crossref_primary_10_1109_TMM_2024_3521793 ieee_primary_10814704
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2025-01-01
PublicationDateYYYYMMDD	2025-01-01
PublicationDate_xml	– month: 01 year: 2025 text: 2025-01-01 day: 01
PublicationDecade	2020
PublicationTitle	IEEE transactions on multimedia
PublicationTitleAbbrev	TMM
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref13 ref57 ref12 ref56 ref15 ref14 ref53 Han (ref26) 2022 ref52 ref11 ref55 ref10 ref17 ref16 ref19 ref18 ref51 Maaten (ref58) 2008; 9 ref45 ref48 ref42 ref41 ref44 ref43 ref8 ref7 ref9 ref4 ref3 ref6 ref5 ref40 Yang (ref39) 2021 Du (ref24) 2021 ref35 ref34 ref37 ref36 ref31 ref30 ref33 ref32 ref2 ref1 Li (ref50) 2022; 35 ref38 Trabucco (ref61) 2023 Wang (ref46) 2022; 35 ref23 Zheng (ref59) 2023; 36 Tan (ref47) 2019 ref20 ref22 ref21 ref28 ref27 ref29 Hinton (ref25) 2014 Kingma (ref54) 2014 ref60 Du (ref49) 2020
References_xml	– ident: ref34 doi: 10.1109/CVPR42600.2020.01009 – ident: ref35 doi: 10.1609/aaai.v32i1.11941 – ident: ref17 doi: 10.1109/LSP.2021.3095761 – ident: ref44 doi: 10.1109/ICASSP43922.2022.9747534 – ident: ref22 doi: 10.1109/BIBM52615.2021.9669556 – ident: ref9 doi: 10.1109/JOE.2015.2408471 – start-page: 18381 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021 ident: ref39 article-title: Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence – ident: ref1 doi: 10.1109/TPAMI.2021.3117983 – ident: ref60 doi: 10.1109/TAI.2024.3436538 – year: 2023 ident: ref61 article-title: Effective data augmentation with diffusion models – ident: ref38 doi: 10.1145/3484440 – ident: ref14 doi: 10.1109/TBME.2016.2583200 – volume-title: Proc. Int. Conf. Learn. Representations year: 2014 ident: ref54 article-title: Adam: A method for stochastic optimization – ident: ref29 doi: 10.1109/TNSRE.2020.3023761 – ident: ref21 doi: 10.1109/ACCESS.2020.3036877 – start-page: 12345 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2020 ident: ref49 article-title: Agree to disagree: Adaptive ensemble knowledge distillation in gradient space – ident: ref19 doi: 10.1088/1741-2552/aa9817 – ident: ref8 doi: 10.1088/1741-2560/8/3/036025 – ident: ref16 doi: 10.1109/TNSRE.2020.3009978 – volume: 36 start-page: 54046 year: 2023 ident: ref59 article-title: Toward understanding generative data augmentation publication-title: Adv. Neural Inf. Process. Syst. – ident: ref48 doi: 10.1109/CVPR.2018.00474 – ident: ref56 doi: 10.1109/TKDE.2008.239 – ident: ref27 doi: 10.1016/j.patcog.2022.108833 – start-page: 6105 volume-title: Proc. Int. Conf. Mach. Learn. year: 2019 ident: ref47 article-title: EfficientNet: Rethinking model scaling for convolutional neural networks – ident: ref2 doi: 10.1109/ACCESS.2020.3033289 – ident: ref5 doi: 10.1109/TMM.2021.3074273 – volume: 35 start-page: 607 year: 2022 ident: ref46 article-title: Efficient knowledge distillation from model checkpoints publication-title: Adv. Neural Inf. Process. Syst. – ident: ref7 doi: 10.1007/s11432-018-9590-5 – ident: ref53 doi: 10.1007/978-3-030-86993-9_50 – ident: ref20 doi: 10.1145/3408317 – ident: ref10 doi: 10.1109/IHMSC.2017.44 – ident: ref40 doi: 10.1109/TMM.2019.2951463 – ident: ref43 doi: 10.1109/ICASSP40776.2020.9054698 – year: 2022 ident: ref26 article-title: Seeing your sleep stage: Cross-modal distillation from eeg to infrared video – ident: ref31 doi: 10.1016/j.neunet.2023.01.009 – ident: ref30 doi: 10.3389/fncom.2016.00130 – ident: ref4 doi: 10.1109/JIOT.2020.2991025 – ident: ref18 doi: 10.1109/TCSII.2022.3208197 – ident: ref33 doi: 10.48550/ARXIV.1706.03762 – volume: 35 start-page: 3830 year: 2022 ident: ref50 article-title: Asymmetric temperature scaling makes larger networks teach well again publication-title: Adv. Neural Inf. Process. Syst. – ident: ref36 doi: 10.1007/s11571-021-09751-5 – volume: 9 start-page: 2579 issue: 11 year: 2008 ident: ref58 article-title: Visualizing data using t-SNE publication-title: J. Mach. Learn. Res. – ident: ref15 doi: 10.1038/s41598-021-85235-0 – ident: ref52 doi: 10.1016/j.jneumeth.2003.10.009 – volume-title: Proc. Inf. Process. Syst. (NeurIPS) Deep Learn. Workshop year: 2014 ident: ref25 article-title: Distilling the knowledge in a neural network – ident: ref28 doi: 10.1007/s11263-021-01453-z – ident: ref57 doi: 10.1109/TNSRE.2021.3099908 – ident: ref6 doi: 10.1109/IWSSIP48289.2020.9145130 – ident: ref45 doi: 10.1609/aaai.v34i04.5963 – ident: ref12 doi: 10.1109/TBME.2014.2300164 – ident: ref3 doi: 10.1109/TITS.2022.3155488 – year: 2021 ident: ref24 article-title: Improving multi-modal learning with uni-modal teachers – ident: ref13 doi: 10.1109/TMM.2020.2999183 – ident: ref42 doi: 10.1145/3097983.3098135 – ident: ref55 doi: 10.18653/v1/d16-1044 – ident: ref37 doi: 10.1109/cvpr52688.2022.01939 – ident: ref11 doi: 10.1109/TMM.2019.2934425 – ident: ref41 doi: 10.1109/ICASSP.2019.8682450 – ident: ref32 doi: 10.1109/TNSRE.2022.3184725 – ident: ref51 doi: 10.1016/j.jneumeth.2006.11.017 – ident: ref23 doi: 10.1109/CVPR42600.2020.01271
SSID	ssj0014507
Score	2.431334
Snippet	Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not...
SourceID	crossref ieee
SourceType	Index Database Publisher
StartPage	2083
SubjectTerms	Attention mechanism Brain modeling Computer vision Detectors EEG Electroencephalography Emotion recognition Feature extraction knowledge distillation Object detection Search problems Training Visualization
Title	Adaptive Knowledge Distillation With Attention-Based Multi-Modal Fusion for Robust Dim Object Detection
URI	https://ieeexplore.ieee.org/document/10814704
Volume	27
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA-6kx6czonzixy8eEjtR7K2x-kconSCbLhbSZpURW2Hphf_el_SVqYgeCuhpSHvJe_3e3kfCJ1KN8gzd6gI9zglFEw0EW4GQM4z1lUBgbAZ3sl0eD2nNwu2aJLVbS6MUsoGnynHPNq7fFlmlXGVwQ6PPBqa6p_rwNzqZK3vKwPKbG402COXxEBk2jtJNz6fJQkwQZ86gDaMQv6wQStNVaxNmXTRtJ1NHUry4lRaONnnr0KN_57uNtpq0CUe1eqwg9ZU0UPdtnMDbjZyD22ulCHcRY8jyZfm2MO3rYcNj83ef60D5fDDs37CI63r0EhyAZZPYpu6S5JSwg8nlXG6YQDA-L4U1YeG79_wnTBOHjxW2sZ7FX00n1zNLq9J04CBZLBOmoSBCIJQUMZ9pUImc5VLL6OA-KKMRUEexDySEfXhjKIs4JTH1FR8ywEGDGUMwGMPdYqyUPsIAwv3JQv83JfAyAW8KWBEMBGCZnAaDtBZK5J0WdfZSC0_ceMUxJca8aWN-AaobxZ75b16nQ_-GD9EG77p2msdJ0eoo98rdQxQQosTq0JfqwfD0g
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT4MwFG-MHtSD0znj_OzBi4cig3bAcTqX6cZMzBZ3I5QWNSosWi7-9b4WMNPExBtpIDR9r_393uv7QOhM2G6a2F1J4k5MCQWIJtxOgMh1NLpKMCBMhnc46Q5n9HbO5lWyusmFkVKa4DNp6Udzly_ypNCuMtjhfod6uvrnGgA_c8p0re9LA8pMdjQgkk0CMGXqW0k7uJiGIdiCDrWAb2iV_IFCS21VDKoMGmhSz6cMJnmxCsWt5PNXqcZ_T3gbbVX8EvdKhdhBKzJrokbduwFXW7mJNpcKEe6ix56IF_rgw6Pax4b7eve_lqFy-OFZPeGeUmVwJLkE7BPYJO-SMBfww0Gh3W4YKDC-z3nxoeD7N3zHtZsH96UyEV9ZC80G19OrIalaMJAE1kkRz-Wu63HKYkdKj4lUpqKTUOB8fsJ8N3WD2Bc-deCUosyNaRxQXfMtBSLQFQFQjz20muWZ3EcY7HBHMNdJHQE2OYc3OYxwxj3QjZh6bXReiyRalJU2ImOh2EEE4ou0-KJKfG3U0ou99F65zgd_jJ-i9eE0HEfjm8noEG04uoevcaMcoVX1XshjIBaKnxh1-gITOccc
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Adaptive+Knowledge+Distillation+With+Attention-Based+Multi-Modal+Fusion+for+Robust+Dim+Object+Detection&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Lan%2C+Zhen&rft.au=Li%2C+Zixing&rft.au=Yan%2C+Chao&rft.au=Xiang%2C+Xiaojia&rft.date=2025-01-01&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=27&rft.spage=2083&rft.epage=2096&rft_id=info:doi/10.1109%2FTMM.2024.3521793&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2024_3521793
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon