Time-Frequency and Framewise Self-Attention-based DNN with GradCAM++ for Noise-Resilient Environmental Sound Classification
In the robust IoT based system design, it is hard to make decisions incorporate the environmental sound features as well as to incorporate dynamic and noise resilient environments into the system. The development in the use of Environ- mental Sound Classification (ESC) in models which incorporates s...
Saved in:
Published in | 2024 9th International Conference on Communication and Electronics Systems (ICCES) pp. 473 - 478 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
16.12.2024
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ICCES63552.2024.10859684 |
Cover
Summary: | In the robust IoT based system design, it is hard to make decisions incorporate the environmental sound features as well as to incorporate dynamic and noise resilient environments into the system. The development in the use of Environ- mental Sound Classification (ESC) in models which incorporates smart surveillance, monitoring and healthcare is gaining attention in the research. This paper proposes a novel architecture called the Time- Frequency and Frame wise Self-Attention-Based DNN with Grad- CAM++ for Noise-Resilient Environmental Sound Classification in order to minimize noise from sound spectrograms and focus on distinguishing the important sound details. Furthermore, the paper introduces GradCAM++ to refine spatial feature localization, and EfficientNetV2 to improve computational performance. Incorporating the latest state-of-the-art techniques, the model showed 93.01% accuracy and surpassed baseline results focusing the UrbanSound8K dataset. |
---|---|
DOI: | 10.1109/ICCES63552.2024.10859684 |