Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performance. The recent works have shown that adapting spee...

Full description

Saved in:
Bibliographic Details
Main Authors Shi, Yanpei, Huang, Qiang, Hain, Thomas
Format Journal Article
LanguageEnglish
Published 15.05.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performance. The recent works have shown that adapting speech enhancement can lead to further gains. This paper introduces a novel approach that cascades speech enhancement and speaker recognition. In the first step, a speaker embedding vector is generated , which is used in the second step to enhance the speech quality and re-identify the speakers. Models are trained in an integrated framework with joint optimisation. The proposed approach is evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition in real world situations. In addition three types of noise at different signal-noise-ratios were added for this work. The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.
AbstractList While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved performance. The recent works have shown that adapting speech enhancement can lead to further gains. This paper introduces a novel approach that cascades speech enhancement and speaker recognition. In the first step, a speaker embedding vector is generated , which is used in the second step to enhance the speech quality and re-identify the speakers. Models are trained in an integrated framework with joint optimisation. The proposed approach is evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition in real world situations. In addition three types of noise at different signal-noise-ratios were added for this work. The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.
Author Shi, Yanpei
Huang, Qiang
Hain, Thomas
Author_xml – sequence: 1
  givenname: Yanpei
  surname: Shi
  fullname: Shi, Yanpei
– sequence: 2
  givenname: Qiang
  surname: Huang
  fullname: Huang, Qiang
– sequence: 3
  givenname: Thomas
  surname: Hain
  fullname: Hain, Thomas
BackLink https://doi.org/10.48550/arXiv.2005.07818$$DView paper in arXiv
BookMark eNo1j8tOwzAQRb2ARSn9gK7IDyQd23E9WaK0BaRKSLT7aGqPFQvqRmnE4-8hBVZXOke60rkRV-mUWIi5hKJEY2BB_Wd8LxSAKcCixImodx3TK_fZC-fRcxpiiI6GeErZRxza7F-vuOM0-pGwa7N1aik5Pv6gW3Ed6O3Ms7-div1mva8f8-3zw1N9v81paTH3UlrAoJZkfDgExZUm7QCdRW99kEEZ5bwDrsoDeWuwIlBoKl8aUpqlnoq739tLRdP18Uj9VzPWNJca_Q3Mi0cD
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2005.07818
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2005_07818
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a678-d11708f26a5dfbf2e93a3c08c78d7df1f252cdc0e94bad7589a02859d45a23e13
IEDL.DBID GOX
IngestDate Mon Jan 08 05:48:15 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a678-d11708f26a5dfbf2e93a3c08c78d7df1f252cdc0e94bad7589a02859d45a23e13
OpenAccessLink https://arxiv.org/abs/2005.07818
ParticipantIDs arxiv_primary_2005_07818
PublicationCentury 2000
PublicationDate 2020-05-15
PublicationDateYYYYMMDD 2020-05-15
PublicationDate_xml – month: 05
  year: 2020
  text: 2020-05-15
  day: 15
PublicationDecade 2020
PublicationYear 2020
Score 1.7674882
SecondaryResourceType preprint
Snippet While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computation and Language
Computer Science - Sound
Title Speaker Re-identification with Speaker Dependent Speech Enhancement
URI https://arxiv.org/abs/2005.07818
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEB7anryIolKf5OB1MZtsNtmj1NYiqKAV9lbymNAilLKu4s83yW7Ri5ccJnPJhMk88w3AtacCmcLYxMB5VmDBMq1kWASzEb1du1Q9f3wq52_FQy3qAZDdXxjdfK-_Onxg83HTpTxkMCpDGDIWW7bun-uuOJmguHr-X77gYybSHyMxO4D93rsjt911HMIAN0cwed2ifseGvGC2dn13ThIIiVlQstu-6-fRtpGCdkWmm1W8k5i_O4bFbLqYzLN-dkGmw_OfuTjQRXlWauG88QwrrrmlykrlpPO5Z0EWzlKsCqNd8NkrTSOUnCuEZhxzfgKjEP7jGAjF3CmrJOdBdayURhhrqkK7oHlIS38K43Ti5baDp4iDJcUyCePs_61z2GMxcow4pOICRm3ziZfBvLbmKsn4BwQkepM
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Speaker+Re-identification+with+Speaker+Dependent+Speech+Enhancement&rft.au=Shi%2C+Yanpei&rft.au=Huang%2C+Qiang&rft.au=Hain%2C+Thomas&rft.date=2020-05-15&rft_id=info:doi/10.48550%2Farxiv.2005.07818&rft.externalDocID=2005_07818