DNN-Based Feature Enhancement Using DOA-Constrained ICA for Robust Speech Recognition

The performance of automatic speech recognition (ASR) system is often degraded in adverse real-world environments. In recent times, deep learning has successfully emerged as a breakthrough for acoustic modeling in ASR; accordingly, deep neural network (DNN)-based speech feature enhancement (FE) appr...

Full description

Saved in:

Bibliographic Details
Published in	IEEE signal processing letters Vol. 23; no. 8; pp. 1091 - 1095
Main Authors	Lee, Ho-Yong, Cho, Ji-Won, Kim, Minook, Park, Hyung-Min
Format	Journal Article
Language	English
Published	New York IEEE 01.08.2016 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Bayes methods Computation Deep neural networks (DNNs) feature enhancement (FE) independent component analysis (ICA) Iron Mathematical analysis Modelling Neural networks Noise Noise measurement Performance enhancement Robustness robustspeech recognition Spectra Speech Speech enhancement Speech recognition Vectors (mathematics)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The performance of automatic speech recognition (ASR) system is often degraded in adverse real-world environments. In recent times, deep learning has successfully emerged as a breakthrough for acoustic modeling in ASR; accordingly, deep neural network (DNN)-based speech feature enhancement (FE) approaches have attracted much attention owing to their powerful modeling capabilities. However, DNN-based approaches are unable to achieve remarkable performance improvements for speech with severe distortion in the test environments different from training environments. In this letter, we propose a DNN-based FE method where the DNN inputs include preenhanced spectral features computed from multichannel input signals to reconstruct noise-robust features. The preenhanced spectral features are obtained by direction-of-arrival (DOA)-constrained independent component analysis (DCICA) followed by Bayesian FE using a hidden-Markov-model prior, to exploit the capabilities of efficient online target speech extraction and efficient FE with prior information for robust ASR. In addition, noise spectral features computed from DCICA are included for further improvement. Therefore, the DNN is trained to reconstruct a clean spectral feature vector, from a sequence of corrupted input feature vectors in addition to the corresponding preenhanced and noise feature vectors. Experimental results demonstrate that the proposed method significantly improves recognition performance, even in mismatched noise conditions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2016.2583658