Speaker diarization with plda i-vector scoring and unsupervised calibration

Speaker diarization via unsupervised i-vector clustering has gained popularity in recent years. In this approach, i-vectors are extracted from short clips of speech segmented from a larger multi-speaker conversation and organized into speaker clusters, typically according to their cosine score. In t...

Full description

Saved in:

Bibliographic Details
Published in	2014 IEEE Spoken Language Technology Workshop (SLT) pp. 413 - 417
Main Authors	Sell, Gregory, Garcia-Romero, Daniel
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2014
Subjects	Calibration Density estimation robust algorithm Principal component analysis Speaker recognition Speech Speech processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Speaker diarization via unsupervised i-vector clustering has gained popularity in recent years. In this approach, i-vectors are extracted from short clips of speech segmented from a larger multi-speaker conversation and organized into speaker clusters, typically according to their cosine score. In this paper, we propose a system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring, a method already frequently utilized in speaker recognition tasks, and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion. We also demonstrate that denser sampling in the i-vector space with overlapping temporal segments provides a gain in the diarization task. We test our system on the CALLHOME conversational telephone speech corpus, which includes multiple languages and a varying number of speakers, and we show that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well.
DOI:	10.1109/SLT.2014.7078610