Latent discriminative representation learning for speaker recognition

Extracting discriminative speaker-specific representations from speech signals and transforming them into fixed length vectors are key steps in speaker identification and verification systems. In this study, we propose a latent discriminative representation learning method for speaker recognition. W...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers of information technology & electronic engineering Vol. 22; no. 5; pp. 697 - 708
Main Authors	Huang, Duolin, Mao, Qirong, Ma, Zhongchen, Zheng, Zhishen, Routryar, Sidheswar, Ocquaye, Elias-Nii-Noi
Format	Journal Article
Language	English
Published	Hangzhou Zhejiang University Press 01.05.2021 Springer Nature B.V Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace,Zhenjiang 212013,China School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang 212013,China%School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang 212013,China
Subjects	Communications Engineering Computer Hardware Computer Science Computer Systems Organization and Communication Networks Datasets Deep learning Dictionaries Discriminant analysis Electrical Engineering Electronics and Microelectronics Instrumentation Learning Lookup tables Mathematical analysis Networks Neural networks Representations Speech Speech recognition Teaching methods Voice recognition Speaker embedding lookup table 潜在可区分性表征学习 Linear mapping matrix 说话人识别 Latent discriminative representation learning Speaker recognition TP391.4 线性映射矩阵说话人嵌入查找表
Online Access	Get full text
ISSN	2095-9184 2095-9230
DOI	10.1631/FITEE.1900690

Cover

More Information
Summary:	Extracting discriminative speaker-specific representations from speech signals and transforming them into fixed length vectors are key steps in speaker identification and verification systems. In this study, we propose a latent discriminative representation learning method for speaker recognition. We mean that the learned representations in this study are not only discriminative but also relevant. Specifically, we introduce an additional speaker embedded lookup table to explore the relevance between different utterances from the same speaker. Moreover, a reconstruction constraint intended to learn a linear mapping matrix is introduced to make representation discriminative. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods based on the Apollo dataset used in the Fearless Steps Challenge in INTERSPEECH2019 and the TIMIT dataset.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2095-9184 2095-9230
DOI:	10.1631/FITEE.1900690