3D Human Animation Synthesis based on a Temporal Diffusion Generative Model

Three-dimensional human motion generation is an important branch of computer graphics and has broad application prospects. Traditional human animation synthesis technologies rely on professional simulation platforms with high labor and time costs. Existing learning-based methods usually generate hum...

Full description

Saved in:
Bibliographic Details
Published in2024 2nd International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA) pp. 108 - 116
Main Authors Cheng, Baoping, Feng, Wenke, Wu, Qinghang, Chen, Jie, Cai, Zhibiao, Zhang, Yemeng, Wang, Sheng, Che, Bin
Format Conference Proceeding
LanguageEnglish
Published IEEE 24.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Three-dimensional human motion generation is an important branch of computer graphics and has broad application prospects. Traditional human animation synthesis technologies rely on professional simulation platforms with high labor and time costs. Existing learning-based methods usually generate human animations by giving the prior motion seed, which lacks generative ability and cannot generate a wide variety of human motions. On the other hand, established generative methods rely on a given prior sample distribution, and their creation capabilities are relatively limited. To that end, we propose a distribution-free human motion synthesis workflow based on a temporal diffusion model. By specifying the high-level semantic motion info, our method is able to generate a wide variety of human motions with diverse styles. Firstly, we construct our human motion dataset by selecting the human motion sequences that cover different motion types and labeling them with the corresponding motion semantics. Secondly, we use the temporal network Transformer to extract the motion semantic features of different kinds of human motion sequences and introduce the self-attention mechanism to ensure temporal continuity between adjacent motion frames. Then, we use the diffusion model to denoise the extracted motion semantic features to generate visually continuous, realistic, and delicate motion sequences. Finally, we conduct a series of experiments on HumanAct12 and UESTC datasets. The experimental results demonstrate that our method achieves better performance in motion reconstruction and generation, and has greater improvements on a few metrics including RMSE, STED, FID, diversity, etc.
DOI:10.1109/PRMVIA63497.2024.00028