Frame-Online DNN-WPE Dereverberation

Signal dereverberation using the weighted prediction error (WPE) method has been proven to be an effective means to raise the accuracy of far-field speech recognition. But in its original formulation, WPE requires multiple iterations over a sufficiently long utterance, rendering it unsuitable for on...

Full description

Saved in:
Bibliographic Details
Published in2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC) pp. 466 - 470
Main Authors Heymann, Jahn, Drude, Lukas, Haeb-Umbach, Reinhold, Kinoshita, Keisuke, Nakatani, Tomohiro
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Signal dereverberation using the weighted prediction error (WPE) method has been proven to be an effective means to raise the accuracy of far-field speech recognition. But in its original formulation, WPE requires multiple iterations over a sufficiently long utterance, rendering it unsuitable for online low-latency applications. Recently, two methods have been proposed to overcome this limitation. One utilizes a neural network to estimate the power spectral density (PSD) of the target signal and works in a block-online fashion. The other method relies on a rather simple PSD estimation which smoothes the observed PSD and utilizes a recursive formulation which enables it to work on a frame-by-frame basis. In this paper, we integrate a deep neural network (DNN) based estimator into the recursive frame-online formulation. We evaluate the performance of the recursive system with different PSD estimators in comparison to the block-online and offline variant on two distinct corpora. The REVERB challenge data, where the signal is mainly deteriorated by reverberation, and a database which combines WSJ and VoiceHome to also consider (directed) noise sources. The results show that although smoothing works surprisingly well, the more sophisticated DNN based estimator shows promising improvements and shortens the performance gap between online and offline processing.
DOI:10.1109/IWAENC.2018.8521255