Language-queried target speech extraction using para-linguistic and non-linguistic prompts
This paper proposes a new language-queried target speech extraction (TSE) task called para-linguistic and non-linguistic text prompts-based TSE (PNTP-TSE), which uses text prompts that describe para-linguistic and non-linguistic information. This framework addresses the limitations of conventional T...
Saved in:
Published in | Acoustical Science and Technology p. e25.27 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
ACOUSTICAL SOCIETY OF JAPAN
2025
|
Subjects | |
Online Access | Get full text |
ISSN | 1346-3969 1347-5177 |
DOI | 10.1250/ast.e25.27 |
Cover
Loading…
Summary: | This paper proposes a new language-queried target speech extraction (TSE) task called para-linguistic and non-linguistic text prompts-based TSE (PNTP-TSE), which uses text prompts that describe para-linguistic and non-linguistic information. This framework addresses the limitations of conventional TSE methods, such as privacy concerns in voiceprint-based systems and dependency on dedicated microphone arrays or video cameras. To support this framework, we construct and provide a new dataset, PromptTSE, which is specifically designed to facilitate various types of language-queried TSE, including PNTP-TSE. We develop a baseline method for PNTP-TSE and conduct experimental evaluations. The experimental results show that PNTP-TSE overcomes the performance degradation issue of voiceprint-based systems caused by the gap in speaking style between enrollment speech and target speech. |
---|---|
ISSN: | 1346-3969 1347-5177 |
DOI: | 10.1250/ast.e25.27 |