Non-intrusive binaural speech recognition prediction for hearing aid processing
Hearing aids (HAs) often feature different signal processing algorithms to optimize speech recognition (SR) in a given acoustic environment. In this paper, we explore if models that predict SR performance of hearing-impaired (HI), aided users are applicable to automatically select the best algorithm...
Saved in:
Published in | Speech communication Vol. 170; p. 103202 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.05.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Hearing aids (HAs) often feature different signal processing algorithms to optimize speech recognition (SR) in a given acoustic environment. In this paper, we explore if models that predict SR performance of hearing-impaired (HI), aided users are applicable to automatically select the best algorithm. To this end, SR experiments are conducted with 19 HI subjects who are aided with an open-source HA. Listeners’ SR is measured in virtual, complex acoustic scenes with two distinct noise conditions using the different speech enhancement strategies implemented in this HA. For model-based selection, we apply a PHOneme-based Binaural Intelligibility model (PHOBI) based on our previous work and extended with a component for simulating hearing loss. The non-intrusive model utilizes a deep neural network to predict phone probabilities; the deterioration of these phone representations in the presence of noise or generally signal degradation is quantified and used as model output. PHOBI model is trained with 960 h of English speech signals, a broad range of noise signals and room impulse responses. The performance of model-based algorithm selection is measured with two metrics: (i) Its ability to rank the HA algorithms in the order of subjective SR results and (ii) the SR difference between the measured best algorithm and the model-based selection (ΔSR). Results are compared to selections obtained with one non-intrusive and two intrusive models. PHOBI outperforms the non-intrusive and one of the intrusive models in both noise conditions, achieving significantly higher correlations (r=0.63 and 0.80). ΔSR scores are significantly lower (better) compared to the non-intrusive baseline (3.5% and 4.6% against 8.6% and 9.8%, respectively). The results in terms of ΔSR between PHOBI and the intrusive models are statistically not different, although PHOBI operates on the observed signal alone and does not require a clean reference signal.
•A DNN-based model accurately predicts the hearing aid algorithm that optimizes speech recognition for its user.•Individual predictions are made for 19 hearing-impaired, aided users in complex acoustic scenes.•The DNN-based approach is non-intrusive and performs equally well as established, intrusive models for speech recognition prediction. |
---|---|
ISSN: | 0167-6393 |
DOI: | 10.1016/j.specom.2025.103202 |