DNN-Based Feature Enhancement Using DOA-Constrained ICA for Robust Speech Recognition
The performance of automatic speech recognition (ASR) system is often degraded in adverse real-world environments. In recent times, deep learning has successfully emerged as a breakthrough for acoustic modeling in ASR; accordingly, deep neural network (DNN)-based speech feature enhancement (FE) appr...
Saved in:
Published in | IEEE signal processing letters Vol. 23; no. 8; pp. 1091 - 1095 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.08.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The performance of automatic speech recognition (ASR) system is often degraded in adverse real-world environments. In recent times, deep learning has successfully emerged as a breakthrough for acoustic modeling in ASR; accordingly, deep neural network (DNN)-based speech feature enhancement (FE) approaches have attracted much attention owing to their powerful modeling capabilities. However, DNN-based approaches are unable to achieve remarkable performance improvements for speech with severe distortion in the test environments different from training environments. In this letter, we propose a DNN-based FE method where the DNN inputs include preenhanced spectral features computed from multichannel input signals to reconstruct noise-robust features. The preenhanced spectral features are obtained by direction-of-arrival (DOA)-constrained independent component analysis (DCICA) followed by Bayesian FE using a hidden-Markov-model prior, to exploit the capabilities of efficient online target speech extraction and efficient FE with prior information for robust ASR. In addition, noise spectral features computed from DCICA are included for further improvement. Therefore, the DNN is trained to reconstruct a clean spectral feature vector, from a sequence of corrupted input feature vectors in addition to the corresponding preenhanced and noise feature vectors. Experimental results demonstrate that the proposed method significantly improves recognition performance, even in mismatched noise conditions. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1070-9908 1558-2361 |
DOI: | 10.1109/LSP.2016.2583658 |