DNN-Based Feature Enhancement Using DOA-Constrained ICA for Robust Speech Recognition

The performance of automatic speech recognition (ASR) system is often degraded in adverse real-world environments. In recent times, deep learning has successfully emerged as a breakthrough for acoustic modeling in ASR; accordingly, deep neural network (DNN)-based speech feature enhancement (FE) appr...

Full description

Saved in:
Bibliographic Details
Published inIEEE signal processing letters Vol. 23; no. 8; pp. 1091 - 1095
Main Authors Lee, Ho-Yong, Cho, Ji-Won, Kim, Minook, Park, Hyung-Min
Format Journal Article
LanguageEnglish
Published New York IEEE 01.08.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The performance of automatic speech recognition (ASR) system is often degraded in adverse real-world environments. In recent times, deep learning has successfully emerged as a breakthrough for acoustic modeling in ASR; accordingly, deep neural network (DNN)-based speech feature enhancement (FE) approaches have attracted much attention owing to their powerful modeling capabilities. However, DNN-based approaches are unable to achieve remarkable performance improvements for speech with severe distortion in the test environments different from training environments. In this letter, we propose a DNN-based FE method where the DNN inputs include preenhanced spectral features computed from multichannel input signals to reconstruct noise-robust features. The preenhanced spectral features are obtained by direction-of-arrival (DOA)-constrained independent component analysis (DCICA) followed by Bayesian FE using a hidden-Markov-model prior, to exploit the capabilities of efficient online target speech extraction and efficient FE with prior information for robust ASR. In addition, noise spectral features computed from DCICA are included for further improvement. Therefore, the DNN is trained to reconstruct a clean spectral feature vector, from a sequence of corrupted input feature vectors in addition to the corresponding preenhanced and noise feature vectors. Experimental results demonstrate that the proposed method significantly improves recognition performance, even in mismatched noise conditions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1070-9908
1558-2361
DOI:10.1109/LSP.2016.2583658