Target sound information extraction: Speech and audio processing with neural networks conditioned on target clues

This paper overviews neural target sound information extraction (TSIE), which consists of extracting the desired information about a sound source in an observed sound mixture given clues about the target source. TSIE is a general framework, which covers various applications, such as target speech/so...

Full description

Saved in:
Bibliographic Details
Published inAcoustical Science and Technology Vol. 46; no. 3; pp. 197 - 209
Main Authors Tawara, Naohiro, Sato, Hiroshi, Delcroix, Marc, Nakatani, Tomohiro, Araki, Shoko, Ashihara, Takanori, Moriya, Takafumi, Ochiai, Tsubasa
Format Journal Article
LanguageEnglish
Published Tokyo ACOUSTICAL SOCIETY OF JAPAN 01.05.2025
一般社団法人 日本音響学会
Japan Science and Technology Agency
Subjects
Online AccessGet full text
ISSN1346-3969
1347-5177
DOI10.1250/ast.e24.124

Cover

Loading…
More Information
Summary:This paper overviews neural target sound information extraction (TSIE), which consists of extracting the desired information about a sound source in an observed sound mixture given clues about the target source. TSIE is a general framework, which covers various applications, such as target speech/sound extraction (TSE), personalized voice activity detection (PVAD), target speaker automatic speech recognition (TS-ASR), etc. We formalize the ideas of TSIE and show how it can be implemented through various examples such as TSE, PVAD, and TS-ASR. We conclude the paper with a discussion of potential future research directions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1346-3969
1347-5177
DOI:10.1250/ast.e24.124