Speaker segmentation using deep speaker vectors for fast speaker change scenarios

A novel speaker segmentation approach based on deep neural network is proposed and investigated. This approach uses deep speaker vectors (d-vectors) to represent speaker characteristics and to find speaker change points. The d-vector is a kind of frame-level speaker discriminative feature, whose dis...

Full description

Saved in:

Bibliographic Details
Published in	2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5420 - 5424
Main Authors	Renyu Wang, Mingliang Gu, Lantian Li, Mingxing Xu, Zheng, Thoms Fang
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2017
Subjects	deep neural networks Feature extraction Hidden Markov models Neural networks Speaker segmentation speaker vector Speech Speech processing Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A novel speaker segmentation approach based on deep neural network is proposed and investigated. This approach uses deep speaker vectors (d-vectors) to represent speaker characteristics and to find speaker change points. The d-vector is a kind of frame-level speaker discriminative feature, whose discriminative training process corresponds to the goal of discriminating a speaker change point from a single speaker speech segment in a short time window. Following the traditional metric-based segmentation, each analysis window contains two sub-windows and is shifting along the audio stream to detect speaker change points, where the speaker characteristics are represented by the means of deep speaker vectors for all frames in each window. Experimental investigations conducted in fast speaker change scenarios show that the proposed method can detect speaker change points more quickly and more effectively than the commonly used segmentation methods.
ISSN:	2379-190X
DOI:	10.1109/ICASSP.2017.7953192