Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception

•Multi-talker speech perception is challenging for people with hearing loss.•Automatic speech separation cannot help without first identifying the target speaker.•We used the brain signal of listeners to jointly identify and extract target speech.•This method eliminates the need for separating sound...

Full description

Saved in:
Bibliographic Details
Published inNeuroImage (Orlando, Fla.) Vol. 223; p. 117282
Main Authors Ceolini, Enea, Hjortkjær, Jens, Wong, Daniel D.E., O’Sullivan, James, Raghavan, Vinay S., Herrero, Jose, Mehta, Ashesh D., Liu, Shih-Chii, Mesgarani, Nima
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.12.2020
Elsevier Limited
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Multi-talker speech perception is challenging for people with hearing loss.•Automatic speech separation cannot help without first identifying the target speaker.•We used the brain signal of listeners to jointly identify and extract target speech.•This method eliminates the need for separating sound sources or knowing their number.•We show the efficacy of this method in both normal and hearing impaired subjects. [Display omitted] Hearing-impaired people often struggle to follow the speech stream of an individual talker in noisy environments. Recent studies show that the brain tracks attended speech and that the attended talker can be decoded from neural data on a single-trial level. This raises the possibility of “neuro-steered” hearing devices in which the brain-decoded intention of a hearing-impaired listener is used to enhance the voice of the attended speaker from a speech separation front-end. So far, methods that use this paradigm have focused on optimizing the brain decoding and the acoustic speech separation independently. In this work, we propose a novel framework called brain-informed speech separation (BISS)11BISS: brain-informed speech separation. in which the information about the attended speech, as decoded from the subject’s brain, is directly used to perform speech separation in the front-end. We present a deep learning model that uses neural data to extract the clean audio signal that a listener is attending to from a multi-talker speech mixture. We show that the framework can be applied successfully to the decoded output from either invasive intracranial electroencephalography (iEEG) or non-invasive electroencephalography (EEG) recordings from hearing-impaired subjects. It also results in improved speech separation, even in scenes with background noise. The generalization capability of the system renders it a perfect candidate for neuro-steered hearing-assistive devices.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
CRediT authorship contribution statement
Enea Ceolini: Conceptualization, Formal analysis, Writing - original draft, Writing - review & editing. Jens Hjortkjær: Conceptualization, Formal analysis. Daniel D.E. Wong: Conceptualization, Formal analysis. James O’Sullivan: Formal analysis. Vinay S. Raghavan: Formal analysis. Jose Herrero: Data curation. Ashesh D. Mehta: Data curation. Shih-Chii Liu: Supervision. Nima Mesgarani: Conceptualization, Formal analysis.
Authors contribution
EC, DW, JeH and NM participated equally in the development of the idea. EC and NM were responsible for the speech separation. DW, JeH and VR were responsible for the EEG analysis. JO and NM were responsible for the iEEG analysis. JoH and AM were responsible for the iEEG data collection. EC created the figures, performed statistical analyses, wrote parts of the paper, and was responsible for the overall paper. SL provided critical feedback on the paper.
ISSN:1053-8119
1095-9572
1095-9572
DOI:10.1016/j.neuroimage.2020.117282