Target Speaker Extraction with Curriculum Learning

This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a target speaker's voice from a mixture containing interfering speakers. For efficient training, we propose designing a curriculum that s...

Full description

Saved in:

Bibliographic Details
Main Authors	Liu, Yun, Liu, Xuechen, Miao, Xiaoxiao, Yamagishi, Junichi
Format	Journal Article
Language	English
Published	11.06.2024
Subjects	Computer Science - Sound
Online Access	Get full text

Cover

Loading…

Abstract	This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a target speaker's voice from a mixture containing interfering speakers. For efficient training, we propose designing a curriculum that selects subsets of increasing complexity, such as increasing similarity between target and interfering speakers, and that selects training data strategically. Our CL strategies include both variants using predefined difficulty measures (e.g. gender, speaker similarity, and signal-to-distortion ratio) and ones using the TSE's standard objective function, each designed to expose the model gradually to more challenging scenarios. Comprehensive testing on the Libri2talker dataset demonstrated that our CL strategies for TSE improved the performance, and the results markedly exceeded baseline models without CL about 1 dB.
AbstractList	This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a target speaker's voice from a mixture containing interfering speakers. For efficient training, we propose designing a curriculum that selects subsets of increasing complexity, such as increasing similarity between target and interfering speakers, and that selects training data strategically. Our CL strategies include both variants using predefined difficulty measures (e.g. gender, speaker similarity, and signal-to-distortion ratio) and ones using the TSE's standard objective function, each designed to expose the model gradually to more challenging scenarios. Comprehensive testing on the Libri2talker dataset demonstrated that our CL strategies for TSE improved the performance, and the results markedly exceeded baseline models without CL about 1 dB.
Author	Yamagishi, Junichi Liu, Yun Miao, Xiaoxiao Liu, Xuechen
Author_xml	– sequence: 1 givenname: Yun surname: Liu fullname: Liu, Yun – sequence: 2 givenname: Xuechen surname: Liu fullname: Liu, Xuechen – sequence: 3 givenname: Xiaoxiao surname: Miao fullname: Miao, Xiaoxiao – sequence: 4 givenname: Junichi surname: Yamagishi fullname: Yamagishi, Junichi
BackLink	https://doi.org/10.48550/arXiv.2406.07845$$DView paper in arXiv
BookMark	eNotzsFOwzAQBFAf4EALH8AJ_0CCd712nSOKCkWKxIHco6VZitXWrUwC5e9pC6eZw2j0Juoi7ZIodQumpOCcued8iF8lkvGlmQVyVwpbzisZ9OteeC1Zzw9D5uUQd0l_x-FD12POcTluxq1uhHOKaXWtLt958yk3_zlV7eO8rRdF8_L0XD80BfuZKwBYwPu-IqBjefPgDFiPjIBIEHxFaCoS5OPcU0C01vbOSnB9JeLtVN393Z7R3T7HLeef7oTvznj7C9xdPqA
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2406.07845
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2406_07845
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a675-11ae166d9414e16b61501362a21224186942094e2aa6764822333d53e85d9ee63
IEDL.DBID	GOX
IngestDate	Tue Jun 18 04:50:34 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a675-11ae166d9414e16b61501362a21224186942094e2aa6764822333d53e85d9ee63
OpenAccessLink	https://arxiv.org/abs/2406.07845
ParticipantIDs	arxiv_primary_2406_07845
PublicationCentury	2000
PublicationDate	2024-06-11
PublicationDateYYYYMMDD	2024-06-11
PublicationDate_xml	– month: 06 year: 2024 text: 2024-06-11 day: 11
PublicationDecade	2020
PublicationYear	2024
Score	1.9253069
SecondaryResourceType	preprint
Snippet	This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Sound
Title	Target Speaker Extraction with Curriculum Learning
URI	https://arxiv.org/abs/2406.07845
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEB7anryIolKf5OA1Sh6b3RyltBZBPbjC3pZkMxURpKyr9Od3kl0fFw-BkMxlJodvnl8ALo1vckdBNW-KJpJqB8t9ZgJvlPdZbi35yHF2-P7BLJ_1XZVVI2DfszCu3bx-9fzA_uM6ws0VgZjOxjCWMrZs3T5WfXEyUXEN8r9y5GOmoz8gsdiD3cG7Yzf9c-zDCN8PQJap25o9rdG9Ycvmm67txwlYzIKy2U8Wjg1spy-HUC7m5WzJh68KuCOPmwvhUBgTrBaaNj6yrAuCBidT4aowVkuKo1A6EjeaQFkpFTKFBRkH0agjmFC0j1NgK6kxX8kMnfXarIK3FBHRcpHYTAd3DNOkYL3u2SjqqHuddD_5_-oUduhBdOxxEuIMJl37ieeEpp2_SCbdAvx5crc
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Target+Speaker+Extraction+with+Curriculum+Learning&rft.au=Liu%2C+Yun&rft.au=Liu%2C+Xuechen&rft.au=Miao%2C+Xiaoxiao&rft.au=Yamagishi%2C+Junichi&rft.date=2024-06-11&rft_id=info:doi/10.48550%2Farxiv.2406.07845&rft.externalDocID=2406_07845