Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception
•Multi-talker speech perception is challenging for people with hearing loss.•Automatic speech separation cannot help without first identifying the target speaker.•We used the brain signal of listeners to jointly identify and extract target speech.•This method eliminates the need for separating sound...
Saved in:
Published in | NeuroImage (Orlando, Fla.) Vol. 223; p. 117282 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Inc
01.12.2020
Elsevier Limited Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Multi-talker speech perception is challenging for people with hearing loss.•Automatic speech separation cannot help without first identifying the target speaker.•We used the brain signal of listeners to jointly identify and extract target speech.•This method eliminates the need for separating sound sources or knowing their number.•We show the efficacy of this method in both normal and hearing impaired subjects.
[Display omitted]
Hearing-impaired people often struggle to follow the speech stream of an individual talker in noisy environments. Recent studies show that the brain tracks attended speech and that the attended talker can be decoded from neural data on a single-trial level. This raises the possibility of “neuro-steered” hearing devices in which the brain-decoded intention of a hearing-impaired listener is used to enhance the voice of the attended speaker from a speech separation front-end. So far, methods that use this paradigm have focused on optimizing the brain decoding and the acoustic speech separation independently. In this work, we propose a novel framework called brain-informed speech separation (BISS)11BISS: brain-informed speech separation. in which the information about the attended speech, as decoded from the subject’s brain, is directly used to perform speech separation in the front-end. We present a deep learning model that uses neural data to extract the clean audio signal that a listener is attending to from a multi-talker speech mixture. We show that the framework can be applied successfully to the decoded output from either invasive intracranial electroencephalography (iEEG) or non-invasive electroencephalography (EEG) recordings from hearing-impaired subjects. It also results in improved speech separation, even in scenes with background noise. The generalization capability of the system renders it a perfect candidate for neuro-steered hearing-assistive devices. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 CRediT authorship contribution statement Enea Ceolini: Conceptualization, Formal analysis, Writing - original draft, Writing - review & editing. Jens Hjortkjær: Conceptualization, Formal analysis. Daniel D.E. Wong: Conceptualization, Formal analysis. James O’Sullivan: Formal analysis. Vinay S. Raghavan: Formal analysis. Jose Herrero: Data curation. Ashesh D. Mehta: Data curation. Shih-Chii Liu: Supervision. Nima Mesgarani: Conceptualization, Formal analysis. Authors contribution EC, DW, JeH and NM participated equally in the development of the idea. EC and NM were responsible for the speech separation. DW, JeH and VR were responsible for the EEG analysis. JO and NM were responsible for the iEEG analysis. JoH and AM were responsible for the iEEG data collection. EC created the figures, performed statistical analyses, wrote parts of the paper, and was responsible for the overall paper. SL provided critical feedback on the paper. |
ISSN: | 1053-8119 1095-9572 1095-9572 |
DOI: | 10.1016/j.neuroimage.2020.117282 |