Language-queried target speech extraction using para-linguistic and non-linguistic prompts

This paper proposes a new language-queried target speech extraction (TSE) task called para-linguistic and non-linguistic text prompts-based TSE (PNTP-TSE), which uses text prompts that describe para-linguistic and non-linguistic information. This framework addresses the limitations of conventional T...

Full description

Saved in:
Bibliographic Details
Published inAcoustical Science and Technology p. e25.27
Main Authors Ito, Nobutaka, Yamauchi, Kazuki, Seki, Kentaro, Saito, Yuki, Saruwatari, Hiroshi, Okamoto, Yuki, Yamaoka, Kouei, Takamichi, Shinnosuke
Format Journal Article
LanguageEnglish
Published ACOUSTICAL SOCIETY OF JAPAN 2025
Subjects
Online AccessGet full text
ISSN1346-3969
1347-5177
DOI10.1250/ast.e25.27

Cover

Loading…
More Information
Summary:This paper proposes a new language-queried target speech extraction (TSE) task called para-linguistic and non-linguistic text prompts-based TSE (PNTP-TSE), which uses text prompts that describe para-linguistic and non-linguistic information. This framework addresses the limitations of conventional TSE methods, such as privacy concerns in voiceprint-based systems and dependency on dedicated microphone arrays or video cameras. To support this framework, we construct and provide a new dataset, PromptTSE, which is specifically designed to facilitate various types of language-queried TSE, including PNTP-TSE. We develop a baseline method for PNTP-TSE and conduct experimental evaluations. The experimental results show that PNTP-TSE overcomes the performance degradation issue of voiceprint-based systems caused by the gap in speaking style between enrollment speech and target speech.
ISSN:1346-3969
1347-5177
DOI:10.1250/ast.e25.27