Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling

Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) r...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on audio, speech, and language processing Vol. 20; no. 1; pp. 108 - 121
Main Authors	May, T., van de Par, S., Kohlrausch, A.
Format	Journal Article
Language	English
Published	Piscataway, NJ IEEE 01.01.2012 Institute of Electrical and Electronics Engineers
Subjects	Adaptation model Applied sciences Automatic speaker recognition (ASR) Data models Estimation Exact sciences and technology Information, signal and communications theory mask estimation Materials mel frequency cepstral coefficient (MFCC) Miscellaneous missing data noise robustness Signal processing Speaker recognition Speech Speech processing Speech recognition Telecommunications and information theory universal background model (UBM) Audio signal processing Background mel frequency cepstral coefficient (MFCC) universal background model (UBM) Non stationary condition Speaker adaptation Speaker recognition Acoustic signal processing Background noise Modeling Missing data Automatic speaker recognition (ASR) Cepstral analysis Noise immunity Robustness Automatic recognition mask estimation Speech processing noise robustness
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) recognition with the adaptation of speaker models using a universal background model (UBM). For MD recognition, the identification of reliable and unreliable feature components is required. For this purpose, the signal-to-noise ratio (SNR) based mask estimation performance of various state-of-the art noise estimation techniques and noise reduction schemes is compared. Speaker recognition experiments show that the usage of a UBM in combination with missing data recognition yields substantial improvements in recognition performance, especially in the presence of highly non-stationary background noise at low SNRs.
ISSN:	1558-7916 1558-7924
DOI:	10.1109/TASL.2011.2158309