Auditory Features Revisited for Robust Speech Recognition
Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law...
Saved in:
Published in | 2010 20th International Conference on Pattern Recognition pp. 4456 - 4459 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.08.2010
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Auditory based front-ends for speech recognition have been compared before, but this paper focuses on two of the most promising algorithms for noise robustness in automatic speech recognition (ASR). The feature sets are Zero-Crossings with Peak Amplitudes (ZCPA) and the recently introduced Power-Law Nonlinearity and Power-Bias Subtraction (PNCC). Standard Mel-Frequency Cepstral Coefficients (MFCC) are also tested for reference. The performance of all features is reported on the TIMIT database using a HMM-based recogniser. It is found that the PNCC features outperform MFCC in clean conditions and are most robust to noise. ZCPA performance is shown to vary widely with filter bank configuration and frame length. The ZCPA performance is poor in clean conditions but is the least affected by white noise. PNCC is shown to be the most promising new feature set for robust ASR in recent years. |
---|---|
ISBN: | 1424475422 9781424475421 |
ISSN: | 1051-4651 2831-7475 |
DOI: | 10.1109/ICPR.2010.1082 |