Learning Generative Models for Multi-Activity Body Pose Estimation

We present a method to simultaneously estimate 3D body pose and action categories from monocular video sequences. Our approach learns a generative model of the relationship of body pose and image appearance using a sparse kernel regressor. Body poses are modelled on a low-dimensional manifold obtain...

Full description

Saved in:

Bibliographic Details
Published in	International journal of computer vision Vol. 83; no. 2; pp. 121 - 134
Main Authors	Jaeggli, Tobias, Koller-Meier, Esther, Van Gool, Luc
Format	Journal Article Conference Proceeding
Language	English
Published	Boston Springer US 01.06.2009 Springer Springer Nature B.V
Subjects	Applied sciences Artificial Intelligence Computer Imaging Computer Science Computer science; control theory; systems Exact sciences and technology Image Processing and Computer Vision Pattern Recognition Pattern Recognition and Graphics Pattern recognition. Digital image processing. Computational geometry Studies Vision Dimensionality reduction Activity recognition Human locomotion Monocular pose estimation Machine learning Tracking Segmentation Image processing N body system Probability distribution Modeling Posture Image sequence Sampling Transfer function Localization Posterior distribution Generative model Bayes estimation Image resolution Monocular vision Inference Discrete geometry Dimension reduction Posterior probability Artificial intelligence Principal component analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present a method to simultaneously estimate 3D body pose and action categories from monocular video sequences. Our approach learns a generative model of the relationship of body pose and image appearance using a sparse kernel regressor. Body poses are modelled on a low-dimensional manifold obtained by Locally Linear Embedding dimensionality reduction. In addition, we learn a prior model of likely body poses and a dynamical model in this pose manifold. Sparse kernel regressors capture the nonlinearities of this mapping efficiently. Within a Recursive Bayesian Sampling framework, the potentially multimodal posterior probability distributions can then be inferred. An activity-switching mechanism based on learned transfer functions allows for inference of the performed activity class, along with the estimation of body pose and 2D image location of the subject. Using a rough foreground segmentation, we compare Binary PCA and distance transforms to encode the appearance. As a postprocessing step, the globally optimal trajectory through the entire sequence is estimated, yielding a single pose estimate per frame that is consistent throughout the sequence. We evaluate the algorithm on challenging sequences with subjects that are alternating between running and walking movements. Our experiments show how the dynamical model helps to track through poorly segmented low-resolution image sequences where tracking otherwise fails, while at the same time reliably classifying the activity type.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23
ISSN:	0920-5691 1573-1405
DOI:	10.1007/s11263-008-0158-0