Multistage speaker diarization of broadcast news

This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner p...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on audio, speech, and language processing Vol. 14; no. 5; pp. 1505 - 1512
Main Authors Barras, C., Xuan Zhu, Meignier, S., Gauvain, J.-L.
Format Journal Article
LanguageEnglish
Published IEEE 01.09.2006
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1558-7916
1558-7924
DOI:10.1109/TASL.2006.878261