Inter dataset variability compensation for speaker recognition

Recently satisfactory results have been obtained in NIST speaker recognition evaluations. These results are mainly due to accurate modeling of a very large development dataset provided by LDC. However, for many realistic scenarios the use of this development dataset is limited due to a dataset misma...

Full description

Saved in:

Bibliographic Details
Published in	2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 4002 - 4006
Main Author	Aronowitz, Hagai
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2014
Subjects	Conferences domain adaptation challenge i-vector inter dataset variability compensation Mixers NIST robust speaker recognition Robustness Speaker recognition Tin
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently satisfactory results have been obtained in NIST speaker recognition evaluations. These results are mainly due to accurate modeling of a very large development dataset provided by LDC. However, for many realistic scenarios the use of this development dataset is limited due to a dataset mismatch. In such cases, collection of a large enough dataset is infeasible. In this work we analyze the sources of degradation for a particular setup in the context of an i-vector PLDA system and conclude that the main source for degradation is an i-vector dataset shift. As a remedy, we introduce inter dataset variability compensation (IDVC) to explicitly compensate for dataset shift in the i-vector space. This is done using the nuisance attribute projection (NAP) method. Using IDVC we managed to reduce error dramatically by more than 50% for the domain mismatch setup.
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2014.6854353