Deep Neural Network approaches for Analysing Videos of Music Performances

This paper presents a framework to automate the labelling process for gestures in musical performance videos with a 3D Convolutional Neural Network (CNN). While this idea was proposed in a previous study, this paper introduces several novelties: (i) Presents a novel method to overcome the class imba...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Liwicki, Foteini Simistira, Upadhyay, Richa, Chhipa, Prakash Chandra, Murphy, Killian, Visi, Federico, Östersjö, Stefan, Liwicki, Marcus
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 24.05.2022
Subjects	Artificial neural networks Machine learning Neural networks Performance enhancement Video
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper presents a framework to automate the labelling process for gestures in musical performance videos with a 3D Convolutional Neural Network (CNN). While this idea was proposed in a previous study, this paper introduces several novelties: (i) Presents a novel method to overcome the class imbalance challenge and make learning possible for co-existent gestures by batch balancing approach and spatial-temporal representations of gestures. (ii) Performs a detailed study on 7 and 18 categories of gestures generated during the performance (guitar play) of musical pieces that have been video-recorded. (iii) Investigates the possibility to use audio features. (iv) Extends the analysis to multiple videos. The novel methods significantly improve the performance of gesture identification by 12 %, when compared to the previous work (51 % in this study over 39 % in previous work). We successfully validate the proposed methods on 7 super classes (72 %), an ensemble of the 18 gestures/classes, and additional videos (75 %).
ISSN:	2331-8422