Target sound information extraction: Speech and audio processing with neural networks conditioned on target clues

This paper overviews neural target sound information extraction (TSIE), which consists of extracting the desired information about a sound source in an observed sound mixture given clues about the target source. TSIE is a general framework, which covers various applications, such as target speech/so...

Full description

Saved in:

Bibliographic Details
Published in	Acoustical Science and Technology Vol. 46; no. 3; pp. 197 - 209
Main Authors	Tawara, Naohiro, Sato, Hiroshi, Delcroix, Marc, Nakatani, Tomohiro, Araki, Shoko, Ashihara, Takanori, Moriya, Takafumi, Ochiai, Tsubasa
Format	Journal Article
Language	English
Published	Tokyo ACOUSTICAL SOCIETY OF JAPAN 01.05.2025 一般社団法人日本音響学会 Japan Science and Technology Agency
Subjects	Audio data Audio processing Automatic speech recognition Information retrieval Neural networks Personalized voice activity detection Sound sources Speech processing Speech recognition Target detection Target speaker automatic speech recognition Target speech extraction Voice activity detectors Voice recognition
Online Access	Get full text
ISSN	1346-3969 1347-5177
DOI	10.1250/ast.e24.124

Cover

Loading…

More Information
Summary:	This paper overviews neural target sound information extraction (TSIE), which consists of extracting the desired information about a sound source in an observed sound mixture given clues about the target source. TSIE is a general framework, which covers various applications, such as target speech/sound extraction (TSE), personalized voice activity detection (PVAD), target speaker automatic speech recognition (TS-ASR), etc. We formalize the ideas of TSIE and show how it can be implemented through various examples such as TSE, PVAD, and TS-ASR. We conclude the paper with a discussion of potential future research directions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1346-3969 1347-5177
DOI:	10.1250/ast.e24.124