Capturing Discriminative Information Using a Deep Architecture in Acoustic Scene Classification

Acoustic scene classification contains frequently misclassified pairs of classes that share many common acoustic properties. Specific details can provide vital clues for distinguishing such pairs of classes. However, these details are generally not noticeable and are hard to generalize for different...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 11; no. 18; p. 8361
Main Authors	Shim, Hye-jin, Jung, Jee-weon, Kim, Ju-ho, Yu, Ha-jin
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.09.2021
Subjects	Acoustic properties acoustic scene classification Acoustics Airports Architecture Classification Deep learning deep neural networks Feature maps light convolutional neural networks Neural networks Regularization methods
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Acoustic scene classification contains frequently misclassified pairs of classes that share many common acoustic properties. Specific details can provide vital clues for distinguishing such pairs of classes. However, these details are generally not noticeable and are hard to generalize for different data distributions. In this study, we investigate various methods for capturing discriminative information and simultaneously improve the generalization ability. We adopt a max feature map method that replaces conventional non-linear activation functions in deep neural networks; therefore, we apply an element-wise comparison between the different filters of a convolution layer’s output. Two data augmentation methods and two deep architecture modules are further explored to reduce overfitting and sustain the system’s discriminative power. Various experiments are conducted using the “detection and classification of acoustic scenes and events 2020 task1-a” dataset to validate the proposed methods. Our results show that the proposed system consistently outperforms the baseline, where the proposed system demonstrates an accuracy of 70.4% compared to the baseline at 65.1%.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app11188361