Speaker diarization with plda i-vector scoring and unsupervised calibration

Speaker diarization via unsupervised i-vector clustering has gained popularity in recent years. In this approach, i-vectors are extracted from short clips of speech segmented from a larger multi-speaker conversation and organized into speaker clusters, typically according to their cosine score. In t...

Full description

Saved in:
Bibliographic Details
Published in2014 IEEE Spoken Language Technology Workshop (SLT) pp. 413 - 417
Main Authors Sell, Gregory, Garcia-Romero, Daniel
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Speaker diarization via unsupervised i-vector clustering has gained popularity in recent years. In this approach, i-vectors are extracted from short clips of speech segmented from a larger multi-speaker conversation and organized into speaker clusters, typically according to their cosine score. In this paper, we propose a system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring, a method already frequently utilized in speaker recognition tasks, and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion. We also demonstrate that denser sampling in the i-vector space with overlapping temporal segments provides a gain in the diarization task. We test our system on the CALLHOME conversational telephone speech corpus, which includes multiple languages and a varying number of speakers, and we show that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well.
DOI:10.1109/SLT.2014.7078610