Target sound information extraction: Speech and audio processing with neural networks conditioned on target clues
This paper overviews neural target sound information extraction (TSIE), which consists of extracting the desired information about a sound source in an observed sound mixture given clues about the target source. TSIE is a general framework, which covers various applications, such as target speech/so...
Saved in:
Published in | Acoustical Science and Technology Vol. 46; no. 3; pp. 197 - 209 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Tokyo
ACOUSTICAL SOCIETY OF JAPAN
01.05.2025
一般社団法人 日本音響学会 Japan Science and Technology Agency |
Subjects | |
Online Access | Get full text |
ISSN | 1346-3969 1347-5177 |
DOI | 10.1250/ast.e24.124 |
Cover
Loading…
Summary: | This paper overviews neural target sound information extraction (TSIE), which consists of extracting the desired information about a sound source in an observed sound mixture given clues about the target source. TSIE is a general framework, which covers various applications, such as target speech/sound extraction (TSE), personalized voice activity detection (PVAD), target speaker automatic speech recognition (TS-ASR), etc. We formalize the ideas of TSIE and show how it can be implemented through various examples such as TSE, PVAD, and TS-ASR. We conclude the paper with a discussion of potential future research directions. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1346-3969 1347-5177 |
DOI: | 10.1250/ast.e24.124 |