Target Speaker Extraction with Curriculum Learning

This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a target speaker's voice from a mixture containing interfering speakers. For efficient training, we propose designing a curriculum that s...

Full description

Saved in:
Bibliographic Details
Main Authors Liu, Yun, Liu, Xuechen, Miao, Xiaoxiao, Yamagishi, Junichi
Format Journal Article
LanguageEnglish
Published 11.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a target speaker's voice from a mixture containing interfering speakers. For efficient training, we propose designing a curriculum that selects subsets of increasing complexity, such as increasing similarity between target and interfering speakers, and that selects training data strategically. Our CL strategies include both variants using predefined difficulty measures (e.g. gender, speaker similarity, and signal-to-distortion ratio) and ones using the TSE's standard objective function, each designed to expose the model gradually to more challenging scenarios. Comprehensive testing on the Libri2talker dataset demonstrated that our CL strategies for TSE improved the performance, and the results markedly exceeded baseline models without CL about 1 dB.
AbstractList This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a target speaker's voice from a mixture containing interfering speakers. For efficient training, we propose designing a curriculum that selects subsets of increasing complexity, such as increasing similarity between target and interfering speakers, and that selects training data strategically. Our CL strategies include both variants using predefined difficulty measures (e.g. gender, speaker similarity, and signal-to-distortion ratio) and ones using the TSE's standard objective function, each designed to expose the model gradually to more challenging scenarios. Comprehensive testing on the Libri2talker dataset demonstrated that our CL strategies for TSE improved the performance, and the results markedly exceeded baseline models without CL about 1 dB.
Author Yamagishi, Junichi
Liu, Yun
Miao, Xiaoxiao
Liu, Xuechen
Author_xml – sequence: 1
  givenname: Yun
  surname: Liu
  fullname: Liu, Yun
– sequence: 2
  givenname: Xuechen
  surname: Liu
  fullname: Liu, Xuechen
– sequence: 3
  givenname: Xiaoxiao
  surname: Miao
  fullname: Miao, Xiaoxiao
– sequence: 4
  givenname: Junichi
  surname: Yamagishi
  fullname: Yamagishi, Junichi
BackLink https://doi.org/10.48550/arXiv.2406.07845$$DView paper in arXiv
BookMark eNotzsFOwzAQBFAf4EALH8AJ_0CCd712nSOKCkWKxIHco6VZitXWrUwC5e9pC6eZw2j0Juoi7ZIodQumpOCcued8iF8lkvGlmQVyVwpbzisZ9OteeC1Zzw9D5uUQd0l_x-FD12POcTluxq1uhHOKaXWtLt958yk3_zlV7eO8rRdF8_L0XD80BfuZKwBYwPu-IqBjefPgDFiPjIBIEHxFaCoS5OPcU0C01vbOSnB9JeLtVN393Z7R3T7HLeef7oTvznj7C9xdPqA
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2406.07845
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2406_07845
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a675-11ae166d9414e16b61501362a21224186942094e2aa6764822333d53e85d9ee63
IEDL.DBID GOX
IngestDate Tue Jun 18 04:50:34 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a675-11ae166d9414e16b61501362a21224186942094e2aa6764822333d53e85d9ee63
OpenAccessLink https://arxiv.org/abs/2406.07845
ParticipantIDs arxiv_primary_2406_07845
PublicationCentury 2000
PublicationDate 2024-06-11
PublicationDateYYYYMMDD 2024-06-11
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-06-11
  day: 11
PublicationDecade 2020
PublicationYear 2024
Score 1.9253069
SecondaryResourceType preprint
Snippet This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Sound
Title Target Speaker Extraction with Curriculum Learning
URI https://arxiv.org/abs/2406.07845
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEB7anryIolKf5OA1Sh6b3RyltBZBPbjC3pZkMxURpKyr9Od3kl0fFw-BkMxlJodvnl8ALo1vckdBNW-KJpJqB8t9ZgJvlPdZbi35yHF2-P7BLJ_1XZVVI2DfszCu3bx-9fzA_uM6ws0VgZjOxjCWMrZs3T5WfXEyUXEN8r9y5GOmoz8gsdiD3cG7Yzf9c-zDCN8PQJap25o9rdG9Ycvmm67txwlYzIKy2U8Wjg1spy-HUC7m5WzJh68KuCOPmwvhUBgTrBaaNj6yrAuCBidT4aowVkuKo1A6EjeaQFkpFTKFBRkH0agjmFC0j1NgK6kxX8kMnfXarIK3FBHRcpHYTAd3DNOkYL3u2SjqqHuddD_5_-oUduhBdOxxEuIMJl37ieeEpp2_SCbdAvx5crc
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Target+Speaker+Extraction+with+Curriculum+Learning&rft.au=Liu%2C+Yun&rft.au=Liu%2C+Xuechen&rft.au=Miao%2C+Xiaoxiao&rft.au=Yamagishi%2C+Junichi&rft.date=2024-06-11&rft_id=info:doi/10.48550%2Farxiv.2406.07845&rft.externalDocID=2406_07845