Adaptive Knowledge Distillation With Attention-Based Multi-Modal Fusion for Robust Dim Object Detection

Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not robust enough to precisely detect dim objects in aerial images due to the cluttered backgrounds, various observing angles, small object scal...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 27; pp. 2083 - 2096
Main Authors Lan, Zhen, Li, Zixing, Yan, Chao, Xiang, Xiaojia, Tang, Dengqing, Zhou, Han, Lai, Jun
Format Journal Article
LanguageEnglish
Published IEEE 01.01.2025
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not robust enough to precisely detect dim objects in aerial images due to the cluttered backgrounds, various observing angles, small object scales, and severe occlusions. Recently, electroencephalography (EEG)-based object detection methods have received increasing attention owing to the advanced cognitive capabilities of human vision. However, how to combine the human intelligence with computer intelligence to achieve robust dim object detection is still an open question. In this paper, we propose a novel approach to efficiently fuse and exploit the properties of multi-modal data for dim object detection. Specifically, we first design a brain-computer interface (BCI) paradigm called eye-tracking-based slow serial visual presentation (ESSVP) to simultaneously collect the paired EEG and image data when subjects search for the dim objects in aerial images. Then, we develop an attention-based multi-modal fusion network to selectively aggregate the learned features of EEG and image modalities. Furthermore, we propose an adaptive multi-teacher knowledge distillation method to efficiently train the multi-modal dim object detector for better performance. To evaluate the effectiveness of our method, we conduct extensive experiments on the collected dataset in subject-dependent and subject-independent tasks. The experimental results demonstrate that the proposed dim object detection method exhibits superior effectiveness and robustness compared to the baselines and the state-of-the-art methods.
AbstractList Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not robust enough to precisely detect dim objects in aerial images due to the cluttered backgrounds, various observing angles, small object scales, and severe occlusions. Recently, electroencephalography (EEG)-based object detection methods have received increasing attention owing to the advanced cognitive capabilities of human vision. However, how to combine the human intelligence with computer intelligence to achieve robust dim object detection is still an open question. In this paper, we propose a novel approach to efficiently fuse and exploit the properties of multi-modal data for dim object detection. Specifically, we first design a brain-computer interface (BCI) paradigm called eye-tracking-based slow serial visual presentation (ESSVP) to simultaneously collect the paired EEG and image data when subjects search for the dim objects in aerial images. Then, we develop an attention-based multi-modal fusion network to selectively aggregate the learned features of EEG and image modalities. Furthermore, we propose an adaptive multi-teacher knowledge distillation method to efficiently train the multi-modal dim object detector for better performance. To evaluate the effectiveness of our method, we conduct extensive experiments on the collected dataset in subject-dependent and subject-independent tasks. The experimental results demonstrate that the proposed dim object detection method exhibits superior effectiveness and robustness compared to the baselines and the state-of-the-art methods.
Author Zhou, Han
Li, Zixing
Lai, Jun
Xiang, Xiaojia
Lan, Zhen
Tang, Dengqing
Yan, Chao
Author_xml – sequence: 1
  givenname: Zhen
  orcidid: 0000-0001-6112-9279
  surname: Lan
  fullname: Lan, Zhen
  email: lanzhen19@nudt.edu.cn
  organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China
– sequence: 2
  givenname: Zixing
  surname: Li
  fullname: Li, Zixing
  email: lizixing16@nudt.edu.cn
  organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China
– sequence: 3
  givenname: Chao
  orcidid: 0000-0002-9995-4239
  surname: Yan
  fullname: Yan, Chao
  email: yanchao@nuaa.edu.cn
  organization: College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
– sequence: 4
  givenname: Xiaojia
  orcidid: 0000-0002-1525-6231
  surname: Xiang
  fullname: Xiang, Xiaojia
  email: xiangxiaojia@nudt.edu.cn
  organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China
– sequence: 5
  givenname: Dengqing
  orcidid: 0000-0002-8781-9101
  surname: Tang
  fullname: Tang, Dengqing
  email: tangdengqing09@nudt.edu.cn
  organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China
– sequence: 6
  givenname: Han
  surname: Zhou
  fullname: Zhou, Han
  email: zhouhan@nudt.edu.cn
  organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China
– sequence: 7
  givenname: Jun
  orcidid: 0000-0003-2342-487X
  surname: Lai
  fullname: Lai, Jun
  email: laijun@nudt.edu.cn
  organization: College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China
BookMark eNpNkM1OwzAQhC1UJNrCnQMHv4CLf-vkWAoFRKNKqIhj5MSb4ipNqtgB8fY4ag-cdlY7s9r9JmjUtA0gdMvojDGa3m-zbMYplzOhONOpuEBjlkpGKNV6FLXilKSc0Ss08X5PKZOK6jHaLaw5BvcN-K1pf2qwO8CPzgdX1ya4tsGfLnzhRQjQDC15MB4szvo6OJK11tR41fvBV7Udfm-L3oeYP-BNsYcySgixxPk1uqxM7eHmXKfoY_W0Xb6Q9eb5dblYkzJeHYgWhRC6kMpwAK1sBZVlpRR8npQqEZVITWITyfVcSCWMNPFFJdOKi2RuU87FFNHT3rJrve-gyo-dO5juN2c0H0DlEVQ-gMrPoGLk7hRxAPDPnjCpqRR_635mVQ
CODEN ITMUF8
Cites_doi 10.1109/CVPR42600.2020.01009
10.1609/aaai.v32i1.11941
10.1109/LSP.2021.3095761
10.1109/ICASSP43922.2022.9747534
10.1109/BIBM52615.2021.9669556
10.1109/JOE.2015.2408471
10.1109/TPAMI.2021.3117983
10.1109/TAI.2024.3436538
10.1145/3484440
10.1109/TBME.2016.2583200
10.1109/TNSRE.2020.3023761
10.1109/ACCESS.2020.3036877
10.1088/1741-2552/aa9817
10.1088/1741-2560/8/3/036025
10.1109/TNSRE.2020.3009978
10.1109/CVPR.2018.00474
10.1109/TKDE.2008.239
10.1016/j.patcog.2022.108833
10.1109/ACCESS.2020.3033289
10.1109/TMM.2021.3074273
10.1007/s11432-018-9590-5
10.1007/978-3-030-86993-9_50
10.1145/3408317
10.1109/IHMSC.2017.44
10.1109/TMM.2019.2951463
10.1109/ICASSP40776.2020.9054698
10.1016/j.neunet.2023.01.009
10.3389/fncom.2016.00130
10.1109/JIOT.2020.2991025
10.1109/TCSII.2022.3208197
10.48550/ARXIV.1706.03762
10.1007/s11571-021-09751-5
10.1038/s41598-021-85235-0
10.1016/j.jneumeth.2003.10.009
10.1007/s11263-021-01453-z
10.1109/TNSRE.2021.3099908
10.1109/IWSSIP48289.2020.9145130
10.1609/aaai.v34i04.5963
10.1109/TBME.2014.2300164
10.1109/TITS.2022.3155488
10.1109/TMM.2020.2999183
10.1145/3097983.3098135
10.18653/v1/d16-1044
10.1109/cvpr52688.2022.01939
10.1109/TMM.2019.2934425
10.1109/ICASSP.2019.8682450
10.1109/TNSRE.2022.3184725
10.1016/j.jneumeth.2006.11.017
10.1109/CVPR42600.2020.01271
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TMM.2024.3521793
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL) - NZ
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1941-0077
EndPage 2096
ExternalDocumentID 10_1109_TMM_2024_3521793
10814704
Genre orig-research
GrantInformation_xml – fundername: Natural Science Foundation of Jiangsu Province
  grantid: BK20241396
  funderid: 10.13039/501100004608
– fundername: National Natural Science Foundation of China
  grantid: 62403240
  funderid: 10.13039/501100001809
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TN5
VH1
ZY4
AAYXX
CITATION
ID FETCH-LOGICAL-c217t-73b337b45a2ee75dfefd1c43268c583f39a8d842763453a4a941549f2386d9223
IEDL.DBID RIE
ISSN 1520-9210
IngestDate Tue Jul 01 05:13:23 EDT 2025
Wed Aug 27 02:04:15 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c217t-73b337b45a2ee75dfefd1c43268c583f39a8d842763453a4a941549f2386d9223
ORCID 0000-0002-9995-4239
0000-0001-6112-9279
0000-0002-1525-6231
0000-0002-8781-9101
0000-0003-2342-487X
PageCount 14
ParticipantIDs crossref_primary_10_1109_TMM_2024_3521793
ieee_primary_10814704
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-01-01
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – month: 01
  year: 2025
  text: 2025-01-01
  day: 01
PublicationDecade 2020
PublicationTitle IEEE transactions on multimedia
PublicationTitleAbbrev TMM
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
ref57
ref12
ref56
ref15
ref14
ref53
Han (ref26) 2022
ref52
ref11
ref55
ref10
ref17
ref16
ref19
ref18
ref51
Maaten (ref58) 2008; 9
ref45
ref48
ref42
ref41
ref44
ref43
ref8
ref7
ref9
ref4
ref3
ref6
ref5
ref40
Yang (ref39) 2021
Du (ref24) 2021
ref35
ref34
ref37
ref36
ref31
ref30
ref33
ref32
ref2
ref1
Li (ref50) 2022; 35
ref38
Trabucco (ref61) 2023
Wang (ref46) 2022; 35
ref23
Zheng (ref59) 2023; 36
Tan (ref47) 2019
ref20
ref22
ref21
ref28
ref27
ref29
Hinton (ref25) 2014
Kingma (ref54) 2014
ref60
Du (ref49) 2020
References_xml – ident: ref34
  doi: 10.1109/CVPR42600.2020.01009
– ident: ref35
  doi: 10.1609/aaai.v32i1.11941
– ident: ref17
  doi: 10.1109/LSP.2021.3095761
– ident: ref44
  doi: 10.1109/ICASSP43922.2022.9747534
– ident: ref22
  doi: 10.1109/BIBM52615.2021.9669556
– ident: ref9
  doi: 10.1109/JOE.2015.2408471
– start-page: 18381
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2021
  ident: ref39
  article-title: Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence
– ident: ref1
  doi: 10.1109/TPAMI.2021.3117983
– ident: ref60
  doi: 10.1109/TAI.2024.3436538
– year: 2023
  ident: ref61
  article-title: Effective data augmentation with diffusion models
– ident: ref38
  doi: 10.1145/3484440
– ident: ref14
  doi: 10.1109/TBME.2016.2583200
– volume-title: Proc. Int. Conf. Learn. Representations
  year: 2014
  ident: ref54
  article-title: Adam: A method for stochastic optimization
– ident: ref29
  doi: 10.1109/TNSRE.2020.3023761
– ident: ref21
  doi: 10.1109/ACCESS.2020.3036877
– start-page: 12345
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2020
  ident: ref49
  article-title: Agree to disagree: Adaptive ensemble knowledge distillation in gradient space
– ident: ref19
  doi: 10.1088/1741-2552/aa9817
– ident: ref8
  doi: 10.1088/1741-2560/8/3/036025
– ident: ref16
  doi: 10.1109/TNSRE.2020.3009978
– volume: 36
  start-page: 54046
  year: 2023
  ident: ref59
  article-title: Toward understanding generative data augmentation
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref48
  doi: 10.1109/CVPR.2018.00474
– ident: ref56
  doi: 10.1109/TKDE.2008.239
– ident: ref27
  doi: 10.1016/j.patcog.2022.108833
– start-page: 6105
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2019
  ident: ref47
  article-title: EfficientNet: Rethinking model scaling for convolutional neural networks
– ident: ref2
  doi: 10.1109/ACCESS.2020.3033289
– ident: ref5
  doi: 10.1109/TMM.2021.3074273
– volume: 35
  start-page: 607
  year: 2022
  ident: ref46
  article-title: Efficient knowledge distillation from model checkpoints
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref7
  doi: 10.1007/s11432-018-9590-5
– ident: ref53
  doi: 10.1007/978-3-030-86993-9_50
– ident: ref20
  doi: 10.1145/3408317
– ident: ref10
  doi: 10.1109/IHMSC.2017.44
– ident: ref40
  doi: 10.1109/TMM.2019.2951463
– ident: ref43
  doi: 10.1109/ICASSP40776.2020.9054698
– year: 2022
  ident: ref26
  article-title: Seeing your sleep stage: Cross-modal distillation from eeg to infrared video
– ident: ref31
  doi: 10.1016/j.neunet.2023.01.009
– ident: ref30
  doi: 10.3389/fncom.2016.00130
– ident: ref4
  doi: 10.1109/JIOT.2020.2991025
– ident: ref18
  doi: 10.1109/TCSII.2022.3208197
– ident: ref33
  doi: 10.48550/ARXIV.1706.03762
– volume: 35
  start-page: 3830
  year: 2022
  ident: ref50
  article-title: Asymmetric temperature scaling makes larger networks teach well again
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: ref36
  doi: 10.1007/s11571-021-09751-5
– volume: 9
  start-page: 2579
  issue: 11
  year: 2008
  ident: ref58
  article-title: Visualizing data using t-SNE
  publication-title: J. Mach. Learn. Res.
– ident: ref15
  doi: 10.1038/s41598-021-85235-0
– ident: ref52
  doi: 10.1016/j.jneumeth.2003.10.009
– volume-title: Proc. Inf. Process. Syst. (NeurIPS) Deep Learn. Workshop
  year: 2014
  ident: ref25
  article-title: Distilling the knowledge in a neural network
– ident: ref28
  doi: 10.1007/s11263-021-01453-z
– ident: ref57
  doi: 10.1109/TNSRE.2021.3099908
– ident: ref6
  doi: 10.1109/IWSSIP48289.2020.9145130
– ident: ref45
  doi: 10.1609/aaai.v34i04.5963
– ident: ref12
  doi: 10.1109/TBME.2014.2300164
– ident: ref3
  doi: 10.1109/TITS.2022.3155488
– year: 2021
  ident: ref24
  article-title: Improving multi-modal learning with uni-modal teachers
– ident: ref13
  doi: 10.1109/TMM.2020.2999183
– ident: ref42
  doi: 10.1145/3097983.3098135
– ident: ref55
  doi: 10.18653/v1/d16-1044
– ident: ref37
  doi: 10.1109/cvpr52688.2022.01939
– ident: ref11
  doi: 10.1109/TMM.2019.2934425
– ident: ref41
  doi: 10.1109/ICASSP.2019.8682450
– ident: ref32
  doi: 10.1109/TNSRE.2022.3184725
– ident: ref51
  doi: 10.1016/j.jneumeth.2006.11.017
– ident: ref23
  doi: 10.1109/CVPR42600.2020.01271
SSID ssj0014507
Score 2.431334
Snippet Automated object detection in aerial images is crucial in both civil and military applications. Existing computer vision-based object detection methods are not...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 2083
SubjectTerms Attention mechanism
Brain modeling
Computer vision
Detectors
EEG
Electroencephalography
Emotion recognition
Feature extraction
knowledge distillation
Object detection
Search problems
Training
Visualization
Title Adaptive Knowledge Distillation With Attention-Based Multi-Modal Fusion for Robust Dim Object Detection
URI https://ieeexplore.ieee.org/document/10814704
Volume 27
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA-6kx6czonzixy8eEjtR7K2x-kconSCbLhbSZpURW2Hphf_el_SVqYgeCuhpSHvJe_3e3kfCJ1KN8gzd6gI9zglFEw0EW4GQM4z1lUBgbAZ3sl0eD2nNwu2aJLVbS6MUsoGnynHPNq7fFlmlXGVwQ6PPBqa6p_rwNzqZK3vKwPKbG402COXxEBk2jtJNz6fJQkwQZ86gDaMQv6wQStNVaxNmXTRtJ1NHUry4lRaONnnr0KN_57uNtpq0CUe1eqwg9ZU0UPdtnMDbjZyD22ulCHcRY8jyZfm2MO3rYcNj83ef60D5fDDs37CI63r0EhyAZZPYpu6S5JSwg8nlXG6YQDA-L4U1YeG79_wnTBOHjxW2sZ7FX00n1zNLq9J04CBZLBOmoSBCIJQUMZ9pUImc5VLL6OA-KKMRUEexDySEfXhjKIs4JTH1FR8ywEGDGUMwGMPdYqyUPsIAwv3JQv83JfAyAW8KWBEMBGCZnAaDtBZK5J0WdfZSC0_ceMUxJca8aWN-AaobxZ75b16nQ_-GD9EG77p2msdJ0eoo98rdQxQQosTq0JfqwfD0g
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT4MwFG-MHtSD0znj_OzBi4cig3bAcTqX6cZMzBZ3I5QWNSosWi7-9b4WMNPExBtpIDR9r_393uv7QOhM2G6a2F1J4k5MCQWIJtxOgMh1NLpKMCBMhnc46Q5n9HbO5lWyusmFkVKa4DNp6Udzly_ypNCuMtjhfod6uvrnGgA_c8p0re9LA8pMdjQgkk0CMGXqW0k7uJiGIdiCDrWAb2iV_IFCS21VDKoMGmhSz6cMJnmxCsWt5PNXqcZ_T3gbbVX8EvdKhdhBKzJrokbduwFXW7mJNpcKEe6ix56IF_rgw6Pax4b7eve_lqFy-OFZPeGeUmVwJLkE7BPYJO-SMBfww0Gh3W4YKDC-z3nxoeD7N3zHtZsH96UyEV9ZC80G19OrIalaMJAE1kkRz-Wu63HKYkdKj4lUpqKTUOB8fsJ8N3WD2Bc-deCUosyNaRxQXfMtBSLQFQFQjz20muWZ3EcY7HBHMNdJHQE2OYc3OYxwxj3QjZh6bXReiyRalJU2ImOh2EEE4ou0-KJKfG3U0ou99F65zgd_jJ-i9eE0HEfjm8noEG04uoevcaMcoVX1XshjIBaKnxh1-gITOccc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Adaptive+Knowledge+Distillation+With+Attention-Based+Multi-Modal+Fusion+for+Robust+Dim+Object+Detection&rft.jtitle=IEEE+transactions+on+multimedia&rft.au=Lan%2C+Zhen&rft.au=Li%2C+Zixing&rft.au=Yan%2C+Chao&rft.au=Xiang%2C+Xiaojia&rft.date=2025-01-01&rft.issn=1520-9210&rft.eissn=1941-0077&rft.volume=27&rft.spage=2083&rft.epage=2096&rft_id=info:doi/10.1109%2FTMM.2024.3521793&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TMM_2024_3521793
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-9210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-9210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-9210&client=summon