A Transformer-based Unsupervised Domain Adaptation Method for Skeleton Behavior Recognition

In recent years, skeleton-based action recognition has received extensive attention, and a large number of researches have achieved excellent performance. In this article, we investigate on unsupervised domain adaptation(UDA) method used in skeleton-based action recognition tasks, which is challengi...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 11; p. 1
Main Authors	Yan, QiuYan, Hu, Yan
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.01.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Activity recognition Adaptation Adaptation models Algorithms Alignment Behavioral sciences Datasets Domains Feature extraction Labels Learning systems Skeleton skeleton behavior recognition Transformer Transformers Unsupervised domain adaptation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, skeleton-based action recognition has received extensive attention, and a large number of researches have achieved excellent performance. In this article, we investigate on unsupervised domain adaptation(UDA) method used in skeleton-based action recognition tasks, which is challenging in real scenes. In domain adaptation tasks, the labels are only available on source domain but unavailable on target domain. Different from other traditional approaches for UDA like the adversarial learning-based methods, we adopt a transformer mechanism based on cross-attention to align different domains. It learns from both source and target domain to reduce the domain shift between different skeleton datasets, thus reducing the effect of pseudo-labels errors which is generated in domain adaptation process. Taking the particularity of skeleton data into account, we explore the feature representation in both spatial and temporal dimensions. We focus on the adjacency dependency of skeleton joints, that is, each node is a weight summary of adjacent joints. It enables the network to pay attention to the global characteristics of skeleton data and consider the local characteristics of joint connections. Sequences are divided into several parts, called subs, to reduce the time cost of the model. We conduct experiments on five datasets for skeleton-based action recognition, including two large-scale datasets (NTU RGB+D, NW-UCLA). Extensive results demonstrate that our method outperforms other approaches in some aspects.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3274658