Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction

The recent rapid development of auditory attention decoding (AAD) offers the possibility of using electroencephalography (EEG) as auxiliary information for target speaker extraction. However, effectively modeling long sequences of speech and resolving the identity of the target speaker from EEG sign...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1 - 5
Main Authors	Fan, Cunhang, Gao, Youdian, Pan, Zexu, Zhang, Jingjing, Zhang, Hongyu, Zhang, Jie, Lv, Zhao
Format	Conference Proceeding
Language	English
Published	IEEE 06.04.2025
Subjects	Brain modeling Data mining EEG Electroencephalography Feature extraction Kolmogorov-Arnold Networks Long Sequence Modeling Mamba Robustness Signal resolution Signal to noise ratio Speech coding Speech processing Target Speaker Extraction Time-domain analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The recent rapid development of auditory attention decoding (AAD) offers the possibility of using electroencephalography (EEG) as auxiliary information for target speaker extraction. However, effectively modeling long sequences of speech and resolving the identity of the target speaker from EEG signals remains a major challenge. In this paper, an improved feature extraction network (IFENet) is proposed for neuro-oriented target speaker extraction, which mainly consists of a speech encoder with dual-path Mamba and an EEG encoder with Kolmogorov-Arnold Networks (KAN). We propose SpeechBiMamba, which makes use of dual-path Mamba in modeling local and global speech sequences to extract speech features. In addition, we propose EEGKAN to effectively extract EEG features that are closely related to the auditory stimuli and locate the target speaker through the subject's attention information. Experiments on the KUL and AVED datasets show that IFENet outperforms the state-of-the-art model, achieving 36% and 29% relative improvements in terms of scale-invariant signal-to-distortion ratio (SI-SDR) under an open evaluation condition.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49660.2025.10888763