An optimization method of human skeleton keyframes selection for action recognition

In the action recognition field based on the characteristics of human skeleton joint points, the selection of keyframes in the skeleton sequence is a significant issue, which directly affects the action recognition accuracy. In order to improve the effectiveness of keyframes selection, this paper pr...

Full description

Saved in:
Bibliographic Details
Published inComplex & intelligent systems Vol. 10; no. 4; pp. 4659 - 4673
Main Authors Chen, Hao, Pan, Yuekai, Wang, Chenwu
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 01.08.2024
Springer Nature B.V
Springer
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the action recognition field based on the characteristics of human skeleton joint points, the selection of keyframes in the skeleton sequence is a significant issue, which directly affects the action recognition accuracy. In order to improve the effectiveness of keyframes selection, this paper proposes inflection point frames, and transforms keyframes selection into a multi-objective optimization problem based on it. First, the pose features are extracted from the input skeleton joint point data, which used to construct the pose feature vector of each frame in time sequence; then, the inflection point frames in the sequence are determined according to the flow of momentum of each body part. Next, the pose feature vectors are input into the keyframes multi-objective optimization model, with the fusion of domain information and the number of keyframes; finally, the output keyframes are input to the action classifier. To verify the effectiveness of the method, the MSR-Action3D, the UTKinect-Action and Florence3D-Action, and the 3 public datasets, are chosen for simulation experiments and the results show that the keyframes sequence obtained by this method can significantly improve the accuracy of multiple action classifiers, and the average recognition accuracy of the three data sets can reach 94.6%, 97.6% and 94.2% respectively. Besides, combining the optimized keyframes with deep learning classifier on the NTU RGB + D dataset can make the accuracies reaching 83.2% and 93.7%.
ISSN:2199-4536
2198-6053
DOI:10.1007/s40747-024-01403-5