Multi-granularity acoustic information fusion for sound event detection

Most previous works on sound event detection (SED) are based on binary hard labels of sound events, leaving other scales of information underexplored. To address this problem, we introduce multiple granularities of knowledge into the system to perform hierarchical acoustic information fusion for SED...

Full description

Saved in:
Bibliographic Details
Published inSignal processing Vol. 227; p. 109691
Main Authors Yin, Han, Chen, Jianfeng, Bai, Jisheng, Wang, Mou, Rahardja, Susanto, Shi, Dongyuan, Gan, Woon-seng
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.02.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Most previous works on sound event detection (SED) are based on binary hard labels of sound events, leaving other scales of information underexplored. To address this problem, we introduce multiple granularities of knowledge into the system to perform hierarchical acoustic information fusion for SED. Specifically, we present an interactive dual-conformer (IDC) module to adaptively fuse the medium-grained and fine-grained acoustic information based on the hard and soft labels of sound events. In addition, we propose a scene-dependent mask estimator (SDME) module to extract the coarse-grained information from acoustic scenes, introducing the scene-event relationships into the SED system. Experimental results show that the proposed IDC and SDME modules efficiently fuse the acoustic information at different scales and therefore further improve the SED performance. The proposed system achieved Top 1 performance in DCASE 2023 Challenge Task 4B. •A system to fuse acoustic information with different granularities to improve the performance of SED.•An interactive dual-conformer module to extract information from soft and hard labels of sound events.•A scene-dependent mask estimator to introduce scene-event relationships into the SED system.
ISSN:0165-1684
DOI:10.1016/j.sigpro.2024.109691