Capturing Discriminative Information Using a Deep Architecture in Acoustic Scene Classification

Acoustic scene classification contains frequently misclassified pairs of classes that share many common acoustic properties. Specific details can provide vital clues for distinguishing such pairs of classes. However, these details are generally not noticeable and are hard to generalize for different...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 11; no. 18; p. 8361
Main Authors Shim, Hye-jin, Jung, Jee-weon, Kim, Ju-ho, Yu, Ha-jin
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.09.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Acoustic scene classification contains frequently misclassified pairs of classes that share many common acoustic properties. Specific details can provide vital clues for distinguishing such pairs of classes. However, these details are generally not noticeable and are hard to generalize for different data distributions. In this study, we investigate various methods for capturing discriminative information and simultaneously improve the generalization ability. We adopt a max feature map method that replaces conventional non-linear activation functions in deep neural networks; therefore, we apply an element-wise comparison between the different filters of a convolution layer’s output. Two data augmentation methods and two deep architecture modules are further explored to reduce overfitting and sustain the system’s discriminative power. Various experiments are conducted using the “detection and classification of acoustic scenes and events 2020 task1-a” dataset to validate the proposed methods. Our results show that the proposed system consistently outperforms the baseline, where the proposed system demonstrates an accuracy of 70.4% compared to the baseline at 65.1%.
ISSN:2076-3417
2076-3417
DOI:10.3390/app11188361