The Application and Improvement of Deep Neural Networks in Environmental Sound Recognition

Neural networks have achieved great results in sound recognition, and many different kinds of acoustic features have been tried as the training input for the network. However, there is still doubt about whether a neural network can efficiently extract features from the raw audio signal input. This s...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 10; no. 17; p. 5965
Main Authors	Lin, Yu-Kai, Su, Mu-Chun, Hsieh, Yi-Zeng
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.09.2020
Subjects	Acoustics Artificial intelligence convolutional neural network deep neural network environmental sound recognition feature combination Methods Neural networks Researchers Voice recognition
Online Access	Get full text
ISSN	2076-3417 2076-3417
DOI	10.3390/app10175965

Cover

More Information
Summary:	Neural networks have achieved great results in sound recognition, and many different kinds of acoustic features have been tried as the training input for the network. However, there is still doubt about whether a neural network can efficiently extract features from the raw audio signal input. This study improved the raw-signal-input network from other researches using deeper network architectures. The raw signals could be better analyzed in the proposed network. We also presented a discussion of several kinds of network settings, and with the spectrogram-like conversion, our network could reach an accuracy of 73.55% in the open-audio-dataset “Dataset for Environmental Sound Classification 50” (ESC50). This study also proposed a network architecture that could combine different kinds of network feeds with different features. With the help of global pooling, a flexible fusion way was integrated into the network. Our experiment successfully combined two different networks with different audio feature inputs (a raw audio signal and the log-mel spectrum). Using the above settings, the proposed ParallelNet finally reached the accuracy of 81.55% in ESC50, which also reached the recognition level of human beings.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app10175965