Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performance. The recent works have shown that adapting spee...

Full description

Saved in:

Bibliographic Details
Main Authors	Shi, Yanpei, Huang, Qiang, Hain, Thomas
Format	Journal Article
Language	English
Published	15.05.2020
Subjects	Computer Science - Computation and Language Computer Science - Sound
Online Access	Get full text

Cover

Loading…

Abstract	While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performance. The recent works have shown that adapting speech enhancement can lead to further gains. This paper introduces a novel approach that cascades speech enhancement and speaker recognition. In the first step, a speaker embedding vector is generated , which is used in the second step to enhance the speech quality and re-identify the speakers. Models are trained in an integrated framework with joint optimisation. The proposed approach is evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition in real world situations. In addition three types of noise at different signal-noise-ratios were added for this work. The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.
AbstractList	While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performance. The recent works have shown that adapting speech enhancement can lead to further gains. This paper introduces a novel approach that cascades speech enhancement and speaker recognition. In the first step, a speaker embedding vector is generated , which is used in the second step to enhance the speech quality and re-identify the speakers. Models are trained in an integrated framework with joint optimisation. The proposed approach is evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition in real world situations. In addition three types of noise at different signal-noise-ratios were added for this work. The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.
Author	Shi, Yanpei Huang, Qiang Hain, Thomas
Author_xml	– sequence: 1 givenname: Yanpei surname: Shi fullname: Shi, Yanpei – sequence: 2 givenname: Qiang surname: Huang fullname: Huang, Qiang – sequence: 3 givenname: Thomas surname: Hain fullname: Hain, Thomas
BackLink	https://doi.org/10.48550/arXiv.2005.07818$$DView paper in arXiv
BookMark	eNo1j8tOwzAQRb2ARSn9gK7IDyQd23E9WaK0BaRKSLT7aGqPFQvqRmnE4-8hBVZXOke60rkRV-mUWIi5hKJEY2BB_Wd8LxSAKcCixImodx3TK_fZC-fRcxpiiI6GeErZRxza7F-vuOM0-pGwa7N1aik5Pv6gW3Ed6O3Ms7-div1mva8f8-3zw1N9v81paTH3UlrAoJZkfDgExZUm7QCdRW99kEEZ5bwDrsoDeWuwIlBoKl8aUpqlnoq739tLRdP18Uj9VzPWNJca_Q3Mi0cD
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2005.07818
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2005_07818
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a678-d11708f26a5dfbf2e93a3c08c78d7df1f252cdc0e94bad7589a02859d45a23e13
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:48:15 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a678-d11708f26a5dfbf2e93a3c08c78d7df1f252cdc0e94bad7589a02859d45a23e13
OpenAccessLink	https://arxiv.org/abs/2005.07818
ParticipantIDs	arxiv_primary_2005_07818
PublicationCentury	2000
PublicationDate	2020-05-15
PublicationDateYYYYMMDD	2020-05-15
PublicationDate_xml	– month: 05 year: 2020 text: 2020-05-15 day: 15
PublicationDecade	2020
PublicationYear	2020
Score	1.7674882
SecondaryResourceType	preprint
Snippet	While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Computation and Language Computer Science - Sound
Title	Speaker Re-identification with Speaker Dependent Speech Enhancement
URI	https://arxiv.org/abs/2005.07818
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEB7anryIolKf5OB1MZtsNtmj1NYiqKAV9lbymNAilLKu4s83yW7Ri5ccJnPJhMk88w3AtacCmcLYxMB5VmDBMq1kWASzEb1du1Q9f3wq52_FQy3qAZDdXxjdfK-_Onxg83HTpTxkMCpDGDIWW7bun-uuOJmguHr-X77gYybSHyMxO4D93rsjt911HMIAN0cwed2ifseGvGC2dn13ThIIiVlQstu-6-fRtpGCdkWmm1W8k5i_O4bFbLqYzLN-dkGmw_OfuTjQRXlWauG88QwrrrmlykrlpPO5Z0EWzlKsCqNd8NkrTSOUnCuEZhxzfgKjEP7jGAjF3CmrJOdBdayURhhrqkK7oHlIS38K43Ti5baDp4iDJcUyCePs_61z2GMxcow4pOICRm3ziZfBvLbmKsn4BwQkepM
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Speaker+Re-identification+with+Speaker+Dependent+Speech+Enhancement&rft.au=Shi%2C+Yanpei&rft.au=Huang%2C+Qiang&rft.au=Hain%2C+Thomas&rft.date=2020-05-15&rft_id=info:doi/10.48550%2Farxiv.2005.07818&rft.externalDocID=2005_07818