Time-Frequency and Framewise Self-Attention-based DNN with GradCAM++ for Noise-Resilient Environmental Sound Classification

In the robust IoT based system design, it is hard to make decisions incorporate the environmental sound features as well as to incorporate dynamic and noise resilient environments into the system. The development in the use of Environ- mental Sound Classification (ESC) in models which incorporates s...

Full description

Saved in:
Bibliographic Details
Published in2024 9th International Conference on Communication and Electronics Systems (ICCES) pp. 473 - 478
Main Authors R, Brindha, Ashithosh, Chintala, Kaduru, Sai Pradeep, R.K., Pongiannan, U, Poornima P, M, Pemila
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.12.2024
Subjects
Online AccessGet full text
DOI10.1109/ICCES63552.2024.10859684

Cover

More Information
Summary:In the robust IoT based system design, it is hard to make decisions incorporate the environmental sound features as well as to incorporate dynamic and noise resilient environments into the system. The development in the use of Environ- mental Sound Classification (ESC) in models which incorporates smart surveillance, monitoring and healthcare is gaining attention in the research. This paper proposes a novel architecture called the Time- Frequency and Frame wise Self-Attention-Based DNN with Grad- CAM++ for Noise-Resilient Environmental Sound Classification in order to minimize noise from sound spectrograms and focus on distinguishing the important sound details. Furthermore, the paper introduces GradCAM++ to refine spatial feature localization, and EfficientNetV2 to improve computational performance. Incorporating the latest state-of-the-art techniques, the model showed 93.01% accuracy and surpassed baseline results focusing the UrbanSound8K dataset.
DOI:10.1109/ICCES63552.2024.10859684