Cross-modal guiding and reweighting network for multi-modal RSVP-based target detection

Rapid Serial Visual Presentation (RSVP) based Brain–Computer Interface (BCI) facilities the high-throughput detection of rare target images by detecting evoked event-related potentials (ERPs). At present, the decoding accuracy of the RSVP-based BCI system limits its practical applications. This stud...

Full description

Saved in:
Bibliographic Details
Published inNeural networks Vol. 161; pp. 65 - 82
Main Authors Mao, Jiayu, Qiu, Shuang, Wei, Wei, He, Huiguang
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Rapid Serial Visual Presentation (RSVP) based Brain–Computer Interface (BCI) facilities the high-throughput detection of rare target images by detecting evoked event-related potentials (ERPs). At present, the decoding accuracy of the RSVP-based BCI system limits its practical applications. This study introduces eye movements (gaze and pupil information), referred to as EYE modality, as another useful source of information to combine with EEG-based BCI and forms a novel target detection system to detect target images in RSVP tasks. We performed an RSVP experiment, recorded the EEG signals and eye movements simultaneously during a target detection task, and constructed a multi-modal dataset including 20 subjects. Also, we proposed a cross-modal guiding and fusion network to fully utilize EEG and EYE modalities and fuse them for better RSVP decoding performance. In this network, a two-branch backbone was built to extract features from these two modalities. A Cross-Modal Feature Guiding (CMFG) module was proposed to guide EYE modality features to complement the EEG modality for better feature extraction. A Multi-scale Multi-modal Reweighting (MMR) module was proposed to enhance the multi-modal features by exploring intra- and inter-modal interactions. And, a Dual Activation Fusion (DAF) was proposed to modulate the enhanced multi-modal features for effective fusion. Our proposed network achieved a balanced accuracy of 88.00% (±2.29) on the collected dataset. The ablation studies and visualizations revealed the effectiveness of the proposed modules. This work implies the effectiveness of introducing the EYE modality in RSVP tasks. And, our proposed network is a promising method for RSVP decoding and further improves the performance of RSVP-based target detection systems. •We design and conduct RSVP experiments to collect EEG and eye movements data.•A cross-modal guiding and reweighting network utilizes multi-modal information.•The proposed network outperforms existing comparable methods in RSVP tasks.•Visualizations and ablation studies verify the effectiveness of the proposed modules.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0893-6080
1879-2782
1879-2782
DOI:10.1016/j.neunet.2023.01.009