Classification of Urban Sound using Sequential Convolutional Neural Network (CNN) Model and its Visualisation

The primary objective of this research is to classify urban noise using a Sequential Convolutional Neural Network (CNN) model, a robust deep learning framework for processing audio signals. Precise urban sound categorization is crucial for public safety, environmental monitoring, and the advancement...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS) pp. 1 - 5
Main Authors Agarwal, Muskan, Gill, Kanwarpartap Singh, Chattopadhyay, Saumitra, Singh, Mukesh
Format Conference Proceeding
LanguageEnglish
Published IEEE 28.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The primary objective of this research is to classify urban noise using a Sequential Convolutional Neural Network (CNN) model, a robust deep learning framework for processing audio signals. Precise urban sound categorization is crucial for public safety, environmental monitoring, and the advancement of smart cities. By quickly extracting hierarchical components from audio input using a sequential convolutional neural network (CNN) architecture, the proposed technique correctly recognises a wide range of urban noises. We are amassing a large collection of urban sounds, which include a diverse range of ambient noises often found in cities, as part of our study. We use the dataset to train the Sequential CNN model. That this model is capable of autonomously learning hierarchical representations of audio data is a major strength. Using previously stated criteria, we test the model's ability to distinguish between various urban sound classes and see how well it performs. To further investigate and comprehend the internal representations learnt by the CNN model, the study also employs visualisation approaches. Visualisation aids in comprehending the model's decision-making process and the features that differentiate different kinds of urban sound. In order to identify which parts of the input spectrogram have a major impact on the model's predictions, heatmaps are created using techniques like gradient-weighted class activation mapping (Grad-CAM). The findings demonstrate the practical usability of the Sequential CNN model, as it effectively classifies urban noises with an accuracy rate of 86%. The visualisation tools boost trust and acceptance in real-world applications by simplifying the model's decision-making process. Thanks to this research, intelligent systems that can assess city sounds have advanced in development. In order to make deep learning models more trustworthy and easier to use, it is necessary to combine interpretability tools with them.
DOI:10.1109/ICITEICS61368.2024.10625578