Auditory Features Revisited for Robust Speech Recognition

Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law...

Full description

Saved in:
Bibliographic Details
Published in2010 20th International Conference on Pattern Recognition pp. 4456 - 4459
Main Authors Kelly, F, Harte, N
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). Standard Mel-Frequency Cepstral Coefficients (MFCC) are also tested for reference. The performance of all features is reported on the TIMIT database using a HMM-based recogniser. It is found that the PNCC features outperform MFCC in clean conditions and are most robust to noise. ZCPA performance is shown to vary widely with filter bank configuration and frame length. The ZCPA performance is poor in clean conditions but is the least affected by white noise. PNCC is shown to be the most promising new feature set for robust ASR in recent years.
ISBN:1424475422
9781424475421
ISSN:1051-4651
2831-7475
DOI:10.1109/ICPR.2010.1082