MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model

Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challeng...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 46; no. 6; pp. 4115 - 4128
Main Authors	Zhang, Mingyuan, Cai, Zhongang, Pan, Liang, Hong, Fangzhou, Guo, Xinying, Yang, Lei, Liu, Ziwei
Format	Journal Article
Language	English
Published	United States IEEE 01.06.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Body parts Computer Graphics Conditional motion generation Decoding diffusion model Human motion Humans Image Processing, Computer-Assisted - methods Mapping Modelling Motion motion synthesis Movement - physiology Noise reduction Pipelines Probabilistic logic Qualitative analysis Sequences Synthesis Task analysis text-driven generation Training Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose MotionDiffuse , one of the first diffusion model-based text-driven motion generation frameworks, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping . Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis . MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation . MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse's controllability for comprehensive motion generation.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2024.3355414