Exploiting Static and Dynamic Human Joint Relations for 3D Pose Estimation via Cascade Transformers
Transformer has become the dominant model in natural language processing (NLP). Researchers have recently attempted to exploit transformer architecture for various computer vision tasks and achieved competitive results. However, few works have been done to explore transformer architecture for 3D hum...
Saved in:
Published in | 2022 26th International Conference on Pattern Recognition (ICPR) pp. 4522 - 4528 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
21.08.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Transformer has become the dominant model in natural language processing (NLP). Researchers have recently attempted to exploit transformer architecture for various computer vision tasks and achieved competitive results. However, few works have been done to explore transformer architecture for 3D human pose estimation (HPE). In this work, we propose cascade transformers, a novel transformer-based method for 3D HPE from a single image. Specifically, our cascade transformers consist of two transformer encoders exploiting static and dynamic human joint relations respectively. Leveraging the self-attention module and the cascade structure, our method comprehensively models the static and dynamic human joint relations. We evaluate our method on Human3.6M. Extensive experiments show that our method achieves excellent performance without explicitly using human skeleton priors. Notably, our single-image method achieves approximately the same performance as the current best transformer-based method PoseFormer even when PoseFormer uses 9 frames to predict pose. |
---|---|
ISSN: | 2831-7475 |
DOI: | 10.1109/ICPR56361.2022.9956421 |