Time-Frequency and Framewise Self-Attention-based DNN with GradCAM++ for Noise-Resilient Environmental Sound Classification

In the robust IoT based system design, it is hard to make decisions incorporate the environmental sound features as well as to incorporate dynamic and noise resilient environments into the system. The development in the use of Environ- mental Sound Classification (ESC) in models which incorporates s...

Full description

Saved in:

Bibliographic Details
Published in	2024 9th International Conference on Communication and Electronics Systems (ICCES) pp. 473 - 478
Main Authors	R, Brindha, Ashithosh, Chintala, Kaduru, Sai Pradeep, R.K., Pongiannan, U, Poornima P, M, Pemila
Format	Conference Proceeding
Language	English
Published	IEEE 16.12.2024
Subjects	Accuracy Acoustic Scene Classification Attention Mechanisms Computational modeling Computer architecture Convolutional Neural Networks (CNN) Costs EfficientNetV2 Environmental Sound Classification (ESC) Frame-wise Self-Attention GradCAM Noise Scene classification Spectrogram Surveillance System analysis and design Time-frequency analysis Time-Frequency Attention
Online Access	Get full text
DOI	10.1109/ICCES63552.2024.10859684

Cover

More Information
Summary:	In the robust IoT based system design, it is hard to make decisions incorporate the environmental sound features as well as to incorporate dynamic and noise resilient environments into the system. The development in the use of Environ- mental Sound Classification (ESC) in models which incorporates smart surveillance, monitoring and healthcare is gaining attention in the research. This paper proposes a novel architecture called the Time- Frequency and Frame wise Self-Attention-Based DNN with Grad- CAM++ for Noise-Resilient Environmental Sound Classification in order to minimize noise from sound spectrograms and focus on distinguishing the important sound details. Furthermore, the paper introduces GradCAM++ to refine spatial feature localization, and EfficientNetV2 to improve computational performance. Incorporating the latest state-of-the-art techniques, the model showed 93.01% accuracy and surpassed baseline results focusing the UrbanSound8K dataset.
DOI:	10.1109/ICCES63552.2024.10859684