Deep Neural Network approaches for Analysing Videos of Music Performances

This paper presents a framework to automate the labelling process for gestures in musical performance videos with a 3D Convolutional Neural Network (CNN). While this idea was proposed in a previous study, this paper introduces several novelties: (i) Presents a novel method to overcome the class imba...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Liwicki, Foteini Simistira, Upadhyay, Richa, Chhipa, Prakash Chandra, Murphy, Killian, Visi, Federico, Östersjö, Stefan, Liwicki, Marcus
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 24.05.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents a framework to automate the labelling process for gestures in musical performance videos with a 3D Convolutional Neural Network (CNN). While this idea was proposed in a previous study, this paper introduces several novelties: (i) Presents a novel method to overcome the class imbalance challenge and make learning possible for co-existent gestures by batch balancing approach and spatial-temporal representations of gestures. (ii) Performs a detailed study on 7 and 18 categories of gestures generated during the performance (guitar play) of musical pieces that have been video-recorded. (iii) Investigates the possibility to use audio features. (iv) Extends the analysis to multiple videos. The novel methods significantly improve the performance of gesture identification by 12 %, when compared to the previous work (51 % in this study over 39 % in previous work). We successfully validate the proposed methods on 7 super classes (72 %), an ensemble of the 18 gestures/classes, and additional videos (75 %).
ISSN:2331-8422