Non-intrusive binaural speech recognition prediction for hearing aid processing

Hearing aids (HAs) often feature different signal processing algorithms to optimize speech recognition (SR) in a given acoustic environment. In this paper, we explore if models that predict SR performance of hearing-impaired (HI), aided users are applicable to automatically select the best algorithm...

Full description

Saved in:

Bibliographic Details
Published in	Speech communication Vol. 170; p. 103202
Main Authors	Roßbach, Jana, Westhausen, Nils L., Kayser, Hendrik, Meyer, Bernd T.
Format	Journal Article
Language	English
Published	Elsevier B.V 01.05.2025
Subjects	Binaural Deep neural network Non-intrusive Speech recognition prediction Non-intrusive Deep neural network Speech recognition prediction Binaural
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Hearing aids (HAs) often feature different signal processing algorithms to optimize speech recognition (SR) in a given acoustic environment. In this paper, we explore if models that predict SR performance of hearing-impaired (HI), aided users are applicable to automatically select the best algorithm. To this end, SR experiments are conducted with 19 HI subjects who are aided with an open-source HA. Listeners’ SR is measured in virtual, complex acoustic scenes with two distinct noise conditions using the different speech enhancement strategies implemented in this HA. For model-based selection, we apply a PHOneme-based Binaural Intelligibility model (PHOBI) based on our previous work and extended with a component for simulating hearing loss. The non-intrusive model utilizes a deep neural network to predict phone probabilities; the deterioration of these phone representations in the presence of noise or generally signal degradation is quantified and used as model output. PHOBI model is trained with 960 h of English speech signals, a broad range of noise signals and room impulse responses. The performance of model-based algorithm selection is measured with two metrics: (i) Its ability to rank the HA algorithms in the order of subjective SR results and (ii) the SR difference between the measured best algorithm and the model-based selection (ΔSR). Results are compared to selections obtained with one non-intrusive and two intrusive models. PHOBI outperforms the non-intrusive and one of the intrusive models in both noise conditions, achieving significantly higher correlations (r=0.63 and 0.80). ΔSR scores are significantly lower (better) compared to the non-intrusive baseline (3.5% and 4.6% against 8.6% and 9.8%, respectively). The results in terms of ΔSR between PHOBI and the intrusive models are statistically not different, although PHOBI operates on the observed signal alone and does not require a clean reference signal. •A DNN-based model accurately predicts the hearing aid algorithm that optimizes speech recognition for its user.•Individual predictions are made for 19 hearing-impaired, aided users in complex acoustic scenes.•The DNN-based approach is non-intrusive and performs equally well as established, intrusive models for speech recognition prediction.
ISSN:	0167-6393
DOI:	10.1016/j.specom.2025.103202