Towards Robust Local Key Estimation with a Musically Inspired Neural Network

Local key estimation from music audio recordings is a challenging task. Due to its complexity and inherent ambiguity, machine-learning methods often overfit to specific pieces and their annotations, therefore lacking robustness and generalizability. Based on a previous case study on the Schubert Win...

Full description

Saved in:
Bibliographic Details
Published in2024 32nd European Signal Processing Conference (EUSIPCO) pp. 26 - 30
Main Authors Ding, Yiwei, WeiB, Christof
Format Conference Proceeding
LanguageEnglish
Published European Association for Signal Processing - EURASIP 26.08.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Local key estimation from music audio recordings is a challenging task. Due to its complexity and inherent ambiguity, machine-learning methods often overfit to specific pieces and their annotations, therefore lacking robustness and generalizability. Based on a previous case study on the Schubert Winterreise dataset, this paper aims to build a robust local key estimation methods. To this end, we propose a novel neural network architecture (OctaveNet), which is inspired by the musical relationship of frequency bins in the constant-Q transform (CQT) and the ability of recurrent layers to process sequential data. OctaveNet rearranges the CQT spectrogram in two different ways, processes each of the branches with convolutional and recurrent layers, and finally fuses the two feature maps to predict the local key. Our results show that, while having fewer parameters, OctaveNet achieves a substantial improvement over previous methods, especially for unseen songs, which indicates its stronger generalizability.
ISSN:2076-1465