Multi-Modal Beat Alignment Transformers for Dance Quality Assessment Framework

In recent years, the dance entertainment industry has been growing as consumers have a desire to learn and improve their dancing skills. To fulfill this need, they need to be evaluated and get feedback to improve their dance skills, but the evaluation is very dependent on professional dancers. With...

Full description

Saved in:

Bibliographic Details
Published in	Journal of Multimedia Information System Vol. 11; no. 2; pp. 149 - 156
Main Author	Kim, Taewan
Format	Journal Article
Language	English
Published	한국멀티미디어학회 30.06.2024
Subjects	컴퓨터학
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, the dance entertainment industry has been growing as consumers have a desire to learn and improve their dancing skills. To fulfill this need, they need to be evaluated and get feedback to improve their dance skills, but the evaluation is very dependent on professional dancers. With the advent of deep learning techniques that can understand and learn the structure of 3D skeletons, graph convolutional networks and transformers have shown performance improvements in 3D human action understanding. In this paper, we propose Dance Quality Assessment (DanceQA) Framework to evaluate dance performance and predicts its dance quality numerically. For problem definition, we collect and capture 3D skeletal data by 3D pose estimator and label their dance quality. By analyzing the dataset, we propose dance quality measures, kinematic information entropy and multi-modal beat similarity, which consider traditional criteria for dance techniques. Based on results of the dance quality measures, kinematic entropy embedding matrix and multi-modal beat alignment transformers are designed to learns salient joints and frames in 3D dance sequence. Thus, we design the overall network architecture, DanceQA transformers, which consider spatial and temporal characteristics of 3D dance sequence from multiple input features and demonstrate that the proposed transformers outperform other Graph Convolutional Network (GCN)s and transformers on the DanceQA dataset. In numerous experiments, the CQTs outperforms previous methods, graph convolutional networks and multimodal transformers, at least by up to 0.146 in correlation coefficient. KCI Citation Count: 0
ISSN:	2383-7632 2383-7632
DOI:	10.33851/JMIS.2024.11.2.149