Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters

Learning natural and diverse behaviors from human motion datasets remains challenging in physics-based character control. Existing conditional adversarial models often suffer from tight and biased embedding distributions where embeddings from the same motion are closely grouped in a small area and s...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Liu, Nian, Liu, Libin, Zhang, Zilong, Wang, Zi, Xie, Hongzhao, Liu, Tengyu, Tong, Xinyi, Yang, Yaodong, He, Zhaofeng
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 10.11.2024
Subjects	Clusters Controllers Datasets Embedding Human motion Hyperspheres Learning Skills
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Learning natural and diverse behaviors from human motion datasets remains challenging in physics-based character control. Existing conditional adversarial models often suffer from tight and biased embedding distributions where embeddings from the same motion are closely grouped in a small area and shorter motions occupy even less space. Our empirical observations indicate this limits the representational capacity and diversity under each skill. An ideal latent space should be maximally packed by all motion's embedding clusters. In this paper, we propose a skill-conditioned controller that learns diverse skills with expressive variations. Our approach leverages the Neural Collapse phenomenon, a natural outcome of the classification-based encoder, to uniformly distributed cluster centers. We additionally propose a novel Embedding Expansion technique to form stylistic embedding clusters for diverse skills that are uniformly distributed on a hypersphere, maximizing the representational area occupied by each skill and minimizing unmapped regions. This maximally packed and uniformly distributed embedding space ensures that embeddings within the same cluster generate behaviors conforming to the characteristics of the corresponding motion clips, yet exhibiting noticeable variations within each cluster. Compared to existing methods, our controller not only generates high-quality, diverse motions covering the entire dataset but also achieves superior controllability, motion coverage, and diversity under each skill. Both qualitative and quantitative results confirm these traits, enabling our controller to be applied to a wide range of downstream tasks and serving as a cornerstone for diverse applications.
ISSN:	2331-8422