A Multiclass Semi-Supervised Deep Convolutional Generative Adversarial Network for Music Genre Classification Using Mel-Frequency Cepstral Coefficients
The growing consumer base and expanding market for various music styles highlight the necessity of classifying music genres to cater to people's preferences. Manual music ranking is a labor-intensive process for listeners, prompting the need for a more efficient approach. This involves the extr...
Saved in:
Published in | 2024 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE) pp. 1 - 6 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
24.01.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The growing consumer base and expanding market for various music styles highlight the necessity of classifying music genres to cater to people's preferences. Manual music ranking is a labor-intensive process for listeners, prompting the need for a more efficient approach. This involves the extraction of Mel-frequency cepstral coefficient feature maps from the log Mel-spectrograms of audio clips. The extracted feature maps are supplied to a multiclass semi-supervised deep convolutional generative adversarial network where the discriminator behaves as a classifier. The training of models involves utilizing the GTZAN standardized dataset, a publicly accessible collection of thousands of audio files spanning ten different genres, from which 80% and 20% of the data are used for training and testing, respectively. Finally, the paper discusses the performance of the semi-supervised deep convolutional generative adversarial network through the RMSprop and Adam optimizers on the original and augmented labeled and unlabeled MFCC feature maps. Without any data augmentation, the discriminator achieves a training accuracy of 97.9% and a test accuracy of about 45.67%. In contrast, the discriminator's training accuracy is about 98.3%, and the test accuracy is 84.75% with data augmentation. |
---|---|
DOI: | 10.1109/IITCEE59897.2024.10467652 |