Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Mala...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on audio, speech, and language processing Vol. 16; no. 5; pp. 1061 - 1070
Main Authors	Dong Yu, Li Deng, Droppo, J., Jian Wu, Yifan Gong, Acero, A.
Format	Journal Article
Language	English
Published	Piscataway, NJ IEEE 01.07.2008 Institute of Electrical and Electronics Engineers
Subjects	Algorithms Applied sciences Cepstral analysis Channels Detection, estimation, filtering, equalization, prediction Discrete Fourier transforms Error analysis Errors Exact sciences and technology Filter bank Information, signal and communications theory Mel frequency cepstral coefficient Mel-frequency cepstral coefficient (MFCC) minimum-mean-square-error (MMSE) estimate Miscellaneous Noise Noise level Noise reduction Noise robustness phase asynchrony robust automatic speech recognition (ASR) Signal and communications theory Signal processing Signal, noise Spectra Speech processing Speech recognition Statistics Suppressors Telecommunications and information theory minimum-mean-square-error (MMSE) estimate Noise reduction Error rate Discrete Fourier transformation robust automatic speech recognition (ASR) phase asynchrony Baseline Algorithm Mismatching Optimization Mean square error Cepstral analysis Mel-frequency cepstral coefficient (MFCC) Speech recognition Signal processing Noise immunity Filter bank Automatic recognition Speech processing Piecewise linearization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1558-7916 1558-7924
DOI:	10.1109/TASL.2008.921761