FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels
This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overla...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
28.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This report presents the systems developed and submitted by Fortemedia
Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS)
for DCASE 2024 Task 4. The task focuses on recognizing event classes and their
time boundaries, given that multiple events can be present and may overlap in
an audio recording. The novelty this year is a dataset with two sources, making
it challenging to achieve good performance without knowing the source of the
audio clips during evaluation. To address this, we propose a sound event
detection method using domain generalization. Our approach integrates features
from bidirectional encoder representations from audio transformers and a
convolutional recurrent neural network. We focus on three main strategies to
improve our method. First, we apply mixstyle to the frequency dimension to
adapt the mel-spectrograms from different domains. Second, we consider training
loss of our model specific to each datasets for their corresponding classes.
This independent learning framework helps the model extract domain-specific
features effectively. Lastly, we use the sound event bounding boxes method for
post-processing. Our proposed method shows superior macro-average pAUC and
polyphonic SED score performance on the DCASE 2024 Challenge Task 4 validation
dataset and public evaluation dataset. |
---|---|
DOI: | 10.48550/arxiv.2407.00291 |