Adaptive Knowledge Distillation With Attention-Based Multi-Modal Fusion for Robust Dim Object Detection
Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not robust enough to precisely detect dim objects in aerial images due to the cluttered backgrounds, various observing angles, small object scal...
Saved in:
Published in | IEEE transactions on multimedia Vol. 27; pp. 2083 - 2096 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.01.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not robust enough to precisely detect dim objects in aerial images due to the cluttered backgrounds, various observing angles, small object scales, and severe occlusions. Recently, electroencephalography (EEG)-based object detection methods have received increasing attention owing to the advanced cognitive capabilities of human vision. However, how to combine the human intelligence with computer intelligence to achieve robust dim object detection is still an open question. In this paper, we propose a novel approach to efficiently fuse and exploit the properties of multi-modal data for dim object detection. Specifically, we first design a brain-computer interface (BCI) paradigm called eye-tracking-based slow serial visual presentation (ESSVP) to simultaneously collect the paired EEG and image data when subjects search for the dim objects in aerial images. Then, we develop an attention-based multi-modal fusion network to selectively aggregate the learned features of EEG and image modalities. Furthermore, we propose an adaptive multi-teacher knowledge distillation method to efficiently train the multi-modal dim object detector for better performance. To evaluate the effectiveness of our method, we conduct extensive experiments on the collected dataset in subject-dependent and subject-independent tasks. The experimental results demonstrate that the proposed dim object detection method exhibits superior effectiveness and robustness compared to the baselines and the state-of-the-art methods. |
---|---|
AbstractList | Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not robust enough to precisely detect dim objects in aerial images due to the cluttered backgrounds, various observing angles, small object scales, and severe occlusions. Recently, electroencephalography (EEG)-based object detection methods have received increasing attention owing to the advanced cognitive capabilities of human vision. However, how to combine the human intelligence with computer intelligence to achieve robust dim object detection is still an open question. In this paper, we propose a novel approach to efficiently fuse and exploit the properties of multi-modal data for dim object detection. Specifically, we first design a brain-computer interface (BCI) paradigm called eye-tracking-based slow serial visual presentation (ESSVP) to simultaneously collect the paired EEG and image data when subjects search for the dim objects in aerial images. Then, we develop an attention-based multi-modal fusion network to selectively aggregate the learned features of EEG and image modalities. Furthermore, we propose an adaptive multi-teacher knowledge distillation method to efficiently train the multi-modal dim object detector for better performance. To evaluate the effectiveness of our method, we conduct extensive experiments on the collected dataset in subject-dependent and subject-independent tasks. The experimental results demonstrate that the proposed dim object detection method exhibits superior effectiveness and robustness compared to the baselines and the state-of-the-art methods. |
Author | Zhou, Han Li, Zixing Lai, Jun Xiang, Xiaojia Lan, Zhen Tang, Dengqing Yan, Chao |
Author_xml | – sequence: 1 givenname: Zhen orcidid: 0000-0001-6112-9279 surname: Lan fullname: Lan, Zhen email: lanzhen19@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 2 givenname: Zixing surname: Li fullname: Li, Zixing email: lizixing16@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 3 givenname: Chao orcidid: 0000-0002-9995-4239 surname: Yan fullname: Yan, Chao email: yanchao@nuaa.edu.cn organization: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China – sequence: 4 givenname: Xiaojia orcidid: 0000-0002-1525-6231 surname: Xiang fullname: Xiang, Xiaojia email: xiangxiaojia@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 5 givenname: Dengqing orcidid: 0000-0002-8781-9101 surname: Tang fullname: Tang, Dengqing email: tangdengqing09@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 6 givenname: Han surname: Zhou fullname: Zhou, Han email: zhouhan@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China – sequence: 7 givenname: Jun orcidid: 0000-0003-2342-487X surname: Lai fullname: Lai, Jun email: laijun@nudt.edu.cn organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China |
BookMark | eNpNkM1OwzAQhC1UJNrCnQMHv4CLf-vkWAoFRKNKqIhj5MSb4ipNqtgB8fY4ag-cdlY7s9r9JmjUtA0gdMvojDGa3m-zbMYplzOhONOpuEBjlkpGKNV6FLXilKSc0Ss08X5PKZOK6jHaLaw5BvcN-K1pf2qwO8CPzgdX1ya4tsGfLnzhRQjQDC15MB4szvo6OJK11tR41fvBV7Udfm-L3oeYP-BNsYcySgixxPk1uqxM7eHmXKfoY_W0Xb6Q9eb5dblYkzJeHYgWhRC6kMpwAK1sBZVlpRR8npQqEZVITWITyfVcSCWMNPFFJdOKi2RuU87FFNHT3rJrve-gyo-dO5juN2c0H0DlEVQ-gMrPoGLk7hRxAPDPnjCpqRR_635mVQ |
CODEN | ITMUF8 |
Cites_doi | 10.1109/CVPR42600.2020.01009 10.1609/aaai.v32i1.11941 10.1109/LSP.2021.3095761 10.1109/ICASSP43922.2022.9747534 10.1109/BIBM52615.2021.9669556 10.1109/JOE.2015.2408471 10.1109/TPAMI.2021.3117983 10.1109/TAI.2024.3436538 10.1145/3484440 10.1109/TBME.2016.2583200 10.1109/TNSRE.2020.3023761 10.1109/ACCESS.2020.3036877 10.1088/1741-2552/aa9817 10.1088/1741-2560/8/3/036025 10.1109/TNSRE.2020.3009978 10.1109/CVPR.2018.00474 10.1109/TKDE.2008.239 10.1016/j.patcog.2022.108833 10.1109/ACCESS.2020.3033289 10.1109/TMM.2021.3074273 10.1007/s11432-018-9590-5 10.1007/978-3-030-86993-9_50 10.1145/3408317 10.1109/IHMSC.2017.44 10.1109/TMM.2019.2951463 10.1109/ICASSP40776.2020.9054698 10.1016/j.neunet.2023.01.009 10.3389/fncom.2016.00130 10.1109/JIOT.2020.2991025 10.1109/TCSII.2022.3208197 10.48550/ARXIV.1706.03762 10.1007/s11571-021-09751-5 10.1038/s41598-021-85235-0 10.1016/j.jneumeth.2003.10.009 10.1007/s11263-021-01453-z 10.1109/TNSRE.2021.3099908 10.1109/IWSSIP48289.2020.9145130 10.1609/aaai.v34i04.5963 10.1109/TBME.2014.2300164 10.1109/TITS.2022.3155488 10.1109/TMM.2020.2999183 10.1145/3097983.3098135 10.18653/v1/d16-1044 10.1109/cvpr52688.2022.01939 10.1109/TMM.2019.2934425 10.1109/ICASSP.2019.8682450 10.1109/TNSRE.2022.3184725 10.1016/j.jneumeth.2006.11.017 10.1109/CVPR42600.2020.01271 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TMM.2024.3521793 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) - NZ CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1941-0077 |
EndPage | 2096 |
ExternalDocumentID | 10_1109_TMM_2024_3521793 10814704 |
Genre | orig-research |
GrantInformation_xml | – fundername: Natural Science Foundation of Jiangsu Province grantid: BK20241396 funderid: 10.13039/501100004608 – fundername: National Natural Science Foundation of China grantid: 62403240 funderid: 10.13039/501100001809 |
GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNS TN5 VH1 ZY4 AAYXX CITATION |
ID | FETCH-LOGICAL-c217t-73b337b45a2ee75dfefd1c43268c583f39a8d842763453a4a941549f2386d9223 |
IEDL.DBID | RIE |
ISSN | 1520-9210 |
IngestDate | Tue Jul 01 05:13:23 EDT 2025 Wed Aug 27 02:04:15 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c217t-73b337b45a2ee75dfefd1c43268c583f39a8d842763453a4a941549f2386d9223 |
ORCID | 0000-0002-9995-4239 0000-0001-6112-9279 0000-0002-1525-6231 0000-0002-8781-9101 0000-0003-2342-487X |
PageCount | 14 |
ParticipantIDs | crossref_primary_10_1109_TMM_2024_3521793 ieee_primary_10814704 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2025-01-01 |
PublicationDateYYYYMMDD | 2025-01-01 |
PublicationDate_xml | – month: 01 year: 2025 text: 2025-01-01 day: 01 |
PublicationDecade | 2020 |
PublicationTitle | IEEE transactions on multimedia |
PublicationTitleAbbrev | TMM |
PublicationYear | 2025 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | ref13 ref57 ref12 ref56 ref15 ref14 ref53 Han (ref26) 2022 ref52 ref11 ref55 ref10 ref17 ref16 ref19 ref18 ref51 Maaten (ref58) 2008; 9 ref45 ref48 ref42 ref41 ref44 ref43 ref8 ref7 ref9 ref4 ref3 ref6 ref5 ref40 Yang (ref39) 2021 Du (ref24) 2021 ref35 ref34 ref37 ref36 ref31 ref30 ref33 ref32 ref2 ref1 Li (ref50) 2022; 35 ref38 Trabucco (ref61) 2023 Wang (ref46) 2022; 35 ref23 Zheng (ref59) 2023; 36 Tan (ref47) 2019 ref20 ref22 ref21 ref28 ref27 ref29 Hinton (ref25) 2014 Kingma (ref54) 2014 ref60 Du (ref49) 2020 |
References_xml | – ident: ref34 doi: 10.1109/CVPR42600.2020.01009 – ident: ref35 doi: 10.1609/aaai.v32i1.11941 – ident: ref17 doi: 10.1109/LSP.2021.3095761 – ident: ref44 doi: 10.1109/ICASSP43922.2022.9747534 – ident: ref22 doi: 10.1109/BIBM52615.2021.9669556 – ident: ref9 doi: 10.1109/JOE.2015.2408471 – start-page: 18381 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2021 ident: ref39 article-title: Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence – ident: ref1 doi: 10.1109/TPAMI.2021.3117983 – ident: ref60 doi: 10.1109/TAI.2024.3436538 – year: 2023 ident: ref61 article-title: Effective data augmentation with diffusion models – ident: ref38 doi: 10.1145/3484440 – ident: ref14 doi: 10.1109/TBME.2016.2583200 – volume-title: Proc. Int. Conf. Learn. Representations year: 2014 ident: ref54 article-title: Adam: A method for stochastic optimization – ident: ref29 doi: 10.1109/TNSRE.2020.3023761 – ident: ref21 doi: 10.1109/ACCESS.2020.3036877 – start-page: 12345 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2020 ident: ref49 article-title: Agree to disagree: Adaptive ensemble knowledge distillation in gradient space – ident: ref19 doi: 10.1088/1741-2552/aa9817 – ident: ref8 doi: 10.1088/1741-2560/8/3/036025 – ident: ref16 doi: 10.1109/TNSRE.2020.3009978 – volume: 36 start-page: 54046 year: 2023 ident: ref59 article-title: Toward understanding generative data augmentation publication-title: Adv. Neural Inf. Process. Syst. – ident: ref48 doi: 10.1109/CVPR.2018.00474 – ident: ref56 doi: 10.1109/TKDE.2008.239 – ident: ref27 doi: 10.1016/j.patcog.2022.108833 – start-page: 6105 volume-title: Proc. Int. Conf. Mach. Learn. year: 2019 ident: ref47 article-title: EfficientNet: Rethinking model scaling for convolutional neural networks – ident: ref2 doi: 10.1109/ACCESS.2020.3033289 – ident: ref5 doi: 10.1109/TMM.2021.3074273 – volume: 35 start-page: 607 year: 2022 ident: ref46 article-title: Efficient knowledge distillation from model checkpoints publication-title: Adv. Neural Inf. Process. Syst. – ident: ref7 doi: 10.1007/s11432-018-9590-5 – ident: ref53 doi: 10.1007/978-3-030-86993-9_50 – ident: ref20 doi: 10.1145/3408317 – ident: ref10 doi: 10.1109/IHMSC.2017.44 – ident: ref40 doi: 10.1109/TMM.2019.2951463 – ident: ref43 doi: 10.1109/ICASSP40776.2020.9054698 – year: 2022 ident: ref26 article-title: Seeing your sleep stage: Cross-modal distillation from eeg to infrared video – ident: ref31 doi: 10.1016/j.neunet.2023.01.009 – ident: ref30 doi: 10.3389/fncom.2016.00130 – ident: ref4 doi: 10.1109/JIOT.2020.2991025 – ident: ref18 doi: 10.1109/TCSII.2022.3208197 – ident: ref33 doi: 10.48550/ARXIV.1706.03762 – volume: 35 start-page: 3830 year: 2022 ident: ref50 article-title: Asymmetric temperature scaling makes larger networks teach well again publication-title: Adv. Neural Inf. Process. Syst. – ident: ref36 doi: 10.1007/s11571-021-09751-5 – volume: 9 start-page: 2579 issue: 11 year: 2008 ident: ref58 article-title: Visualizing data using t-SNE publication-title: J. Mach. Learn. Res. – ident: ref15 doi: 10.1038/s41598-021-85235-0 – ident: ref52 doi: 10.1016/j.jneumeth.2003.10.009 – volume-title: Proc. Inf. Process. Syst. (NeurIPS) Deep Learn. Workshop year: 2014 ident: ref25 article-title: Distilling the knowledge in a neural network – ident: ref28 doi: 10.1007/s11263-021-01453-z – ident: ref57 doi: 10.1109/TNSRE.2021.3099908 – ident: ref6 doi: 10.1109/IWSSIP48289.2020.9145130 – ident: ref45 doi: 10.1609/aaai.v34i04.5963 – ident: ref12 doi: 10.1109/TBME.2014.2300164 – ident: ref3 doi: 10.1109/TITS.2022.3155488 – year: 2021 ident: ref24 article-title: Improving multi-modal learning with uni-modal teachers – ident: ref13 doi: 10.1109/TMM.2020.2999183 – ident: ref42 doi: 10.1145/3097983.3098135 – ident: ref55 doi: 10.18653/v1/d16-1044 – ident: ref37 doi: 10.1109/cvpr52688.2022.01939 – ident: ref11 doi: 10.1109/TMM.2019.2934425 – ident: ref41 doi: 10.1109/ICASSP.2019.8682450 – ident: ref32 doi: 10.1109/TNSRE.2022.3184725 – ident: ref51 doi: 10.1016/j.jneumeth.2006.11.017 – ident: ref23 doi: 10.1109/CVPR42600.2020.01271 |
SSID | ssj0014507 |
Score | 2.431334 |
Snippet | Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not... |
SourceID | crossref ieee |
SourceType | Index Database Publisher |
StartPage | 2083 |
SubjectTerms | Attention mechanism Brain modeling Computer vision Detectors EEG Electroencephalography Emotion recognition Feature extraction knowledge distillation Object detection Search problems Training Visualization |
Title | Adaptive Knowledge Distillation With Attention-Based Multi-Modal Fusion for Robust Dim Object Detection |
URI | https://ieeexplore.ieee.org/document/10814704 |
Volume | 27 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA-6kx6czonzixy8eEjtR7K2x-kconSCbLhbSZpURW2Hphf_el_SVqYgeCuhpSHvJe_3e3kfCJ1KN8gzd6gI9zglFEw0EW4GQM4z1lUBgbAZ3sl0eD2nNwu2aJLVbS6MUsoGnynHPNq7fFlmlXGVwQ6PPBqa6p_rwNzqZK3vKwPKbG402COXxEBk2jtJNz6fJQkwQZ86gDaMQv6wQStNVaxNmXTRtJ1NHUry4lRaONnnr0KN_57uNtpq0CUe1eqwg9ZU0UPdtnMDbjZyD22ulCHcRY8jyZfm2MO3rYcNj83ef60D5fDDs37CI63r0EhyAZZPYpu6S5JSwg8nlXG6YQDA-L4U1YeG79_wnTBOHjxW2sZ7FX00n1zNLq9J04CBZLBOmoSBCIJQUMZ9pUImc5VLL6OA-KKMRUEexDySEfXhjKIs4JTH1FR8ywEGDGUMwGMPdYqyUPsIAwv3JQv83JfAyAW8KWBEMBGCZnAaDtBZK5J0WdfZSC0_ceMUxJca8aWN-AaobxZ75b16nQ_-GD9EG77p2msdJ0eoo98rdQxQQosTq0JfqwfD0g |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT4MwFG-MHtSD0znj_OzBi4cig3bAcTqX6cZMzBZ3I5QWNSosWi7-9b4WMNPExBtpIDR9r_393uv7QOhM2G6a2F1J4k5MCQWIJtxOgMh1NLpKMCBMhnc46Q5n9HbO5lWyusmFkVKa4DNp6Udzly_ypNCuMtjhfod6uvrnGgA_c8p0re9LA8pMdjQgkk0CMGXqW0k7uJiGIdiCDrWAb2iV_IFCS21VDKoMGmhSz6cMJnmxCsWt5PNXqcZ_T3gbbVX8EvdKhdhBKzJrokbduwFXW7mJNpcKEe6ix56IF_rgw6Pax4b7eve_lqFy-OFZPeGeUmVwJLkE7BPYJO-SMBfww0Gh3W4YKDC-z3nxoeD7N3zHtZsH96UyEV9ZC80G19OrIalaMJAE1kkRz-Wu63HKYkdKj4lUpqKTUOB8fsJ8N3WD2Bc-deCUosyNaRxQXfMtBSLQFQFQjz20muWZ3EcY7HBHMNdJHQE2OYc3OYxwxj3QjZh6bXReiyRalJU2ImOh2EEE4ou0-KJKfG3U0ou99F65zgd_jJ-i9eE0HEfjm8noEG04uoevcaMcoVX1XshjIBaKnxh1-gITOccc |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Adaptive+Knowledge+Distillation+With+Attention-Based+Multi-Modal+Fusion+for+Robust+Dim+Object+Detection&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Lan%2C+Zhen&rft.au=Li%2C+Zixing&rft.au=Yan%2C+Chao&rft.au=Xiang%2C+Xiaojia&rft.date=2025-01-01&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=27&rft.spage=2083&rft.epage=2096&rft_id=info:doi/10.1109%2FTMM.2024.3521793&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2024_3521793 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon |