Loss Function Approaches for Multi-label Music Tagging

Given the ever-increasing volume of music created and released every day, it has never been more important to study automatic music tagging. In this paper, we present an ensemble-based convolutional neural network (CNN) model trained using various loss functions for tagging musical genres from audio...

Full description

Saved in:
Bibliographic Details
Published in2021 International Conference on Content-Based Multimedia Indexing (CBMI) pp. 1 - 4
Main Authors Knox, Dillon, Greer, Timothy, Ma, Benjamin, Kuo, Emily, Somandepalli, Krishna, Narayanan, Shrikanth
Format Conference Proceeding
LanguageEnglish
Published IEEE 28.06.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Given the ever-increasing volume of music created and released every day, it has never been more important to study automatic music tagging. In this paper, we present an ensemble-based convolutional neural network (CNN) model trained using various loss functions for tagging musical genres from audio. We investigate the effect of different loss functions and resampling strategies on prediction performance, finding that using focal loss improves overall performance on the the MTG-Jamendo dataset: an imbalanced, multi-label dataset with over 18,000 songs in the public domain, containing 57 labels. Additionally, we report results from varying the receptive field on our base classifier-a CNN-based architecture trained using Mel spectrograms-which also results in a model performance boost and state-of-the-art performance on the Jamendo dataset. We conclude that the choice of the loss function is paramount for improving on existing methods in music tagging, particularly in the presence of class imbalance.
ISSN:1949-3991
DOI:10.1109/CBMI50038.2021.9461913