Multi-Modal Beat Alignment Transformers for Dance Quality Assessment Framework
In recent years, the dance entertainment industry has been growing as consumers have a desire to learn and improve their dancing skills. To fulfill this need, they need to be evaluated and get feedback to improve their dance skills, but the evaluation is very dependent on professional dancers. With...
Saved in:
Published in | Journal of Multimedia Information System Vol. 11; no. 2; pp. 149 - 156 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
한국멀티미디어학회
30.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In recent years, the dance entertainment industry has been growing as consumers have a desire to learn and improve their dancing skills.
To fulfill this need, they need to be evaluated and get feedback to improve their dance skills, but the evaluation is very dependent on professional dancers. With the advent of deep learning techniques that can understand and learn the structure of 3D skeletons, graph convolutional networks and transformers have shown performance improvements in 3D human action understanding. In this paper, we propose Dance Quality Assessment (DanceQA) Framework to evaluate dance performance and predicts its dance quality numerically. For problem definition, we collect and capture 3D skeletal data by 3D pose estimator and label their dance quality. By analyzing the dataset, we propose dance quality measures, kinematic information entropy and multi-modal beat similarity, which consider traditional criteria for dance techniques. Based on results of the dance quality measures, kinematic entropy embedding matrix and multi-modal beat alignment transformers are designed to learns salient joints and frames in 3D dance sequence. Thus, we design the overall network architecture, DanceQA transformers, which consider spatial and temporal characteristics of 3D dance sequence from multiple input features and demonstrate that the proposed transformers outperform other Graph Convolutional Network (GCN)s and transformers on the DanceQA dataset. In numerous experiments, the CQTs outperforms previous methods, graph convolutional networks and multimodal transformers, at least by up to 0.146 in correlation coefficient. KCI Citation Count: 0 |
---|---|
ISSN: | 2383-7632 2383-7632 |
DOI: | 10.33851/JMIS.2024.11.2.149 |