Advancements in Human Action Recognition Through 5G/6G Technology for Smart Cities: Fuzzy Integral-Based Fusion
5-G/6G technology improves skeleton-based human action recognition (HAR) by delivering ultra-low latency and high data throughput for real-time and accurate security analysis of human actions. Despite its growing popularity, current HAR methods frequently fail to capture the skeleton sequence's...
Saved in:
Published in | IEEE transactions on consumer electronics Vol. 70; no. 3; pp. 5783 - 5795 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.08.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | 5-G/6G technology improves skeleton-based human action recognition (HAR) by delivering ultra-low latency and high data throughput for real-time and accurate security analysis of human actions. Despite its growing popularity, current HAR methods frequently fail to capture the skeleton sequence's complexities. This study proposes a novel multimodal method that synergizes the Spatial-Temporal Attention LSTM (STA-LSTM) Network with the Convolutional Neural Network (CNN) to extract nuanced features from the skeleton sequence. The STA-LSTM network dives deep into inter- and intra-frame relations, while the CNN model uncovers geometric correlations within the human skeleton. Significantly, by integrating the Choquet fuzzy integral, we achieve a harmonized fusion of classifiers for each feature vector. Adopting Kullback Leibler and Jensen-Shannon divergences further ensures the complementary nature of these feature vectors. STA-LSTM Network and CNN in the proposed multimodal method significantly advance human action recognition. Impressive accuracy was demonstrated by our approach after evaluating benchmark skeletal datasets such as NTU-60, NTU-120, HDM05, and UT-DMHAD. Specifically, it achieved C-subject 90.75%, 84.50%, and C-setting 96.7% and 86.70% on NTU-60 and NTU-120, respectively. Furthermore, HDM05 and UT-DMHAD datasets recorded accuracies of 93.5% and 97.43%, indicating that our model outperforms current techniques and has excellent potential for sentiment analysis platforms that combine textual and visual signals. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0098-3063 1558-4127 |
DOI: | 10.1109/TCE.2024.3420936 |