SmartMocap: Joint Estimation of Human and Camera Motion Using Uncalibrated RGB Cameras

Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem. Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system. The calibration step has to be done a priori for every...

Full description

Saved in:

Bibliographic Details
Published in	IEEE robotics and automation letters Vol. 8; no. 6; pp. 3206 - 3213
Main Authors	Saini, Nitin, Huang, Chun-Hao P., Black, Michael J., Ahmad, Aamir
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.06.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Biological system modeling Calibration Cameras Coordinates deep learning for visual perception Gesture Ground plane Human body human detection and tracking Human motion Motion capture Optimization posture and facial expressions Robot vision systems Sequences Shape Trajectory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem. Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system. The calibration step has to be done a priori for every capture session, which is a tedious process, and re-calibration is required whenever cameras are intentionally or accidentally moved. In this letter, we propose a mocap method which uses multiple static and moving extrinsically uncalibrated RGB cameras. The key components of our method are as follows. First, since the cameras and the subject can move freely, we select the ground plane as a common reference to represent both the body and the camera motions unlike existing methods which represent bodies in the camera coordinate system. Second, we learn a probability distribution of short human motion sequences (<inline-formula><tex-math notation="LaTeX">\sim</tex-math></inline-formula>1 sec) relative to the ground plane and leverage it to disambiguate between the camera and human motion. Third, we use this distribution as a motion prior in a novel multi-stage optimization approach to fit the SMPL human body model and the camera poses to the human body keypoints on the images. Finally, we show that our method can work on a variety of datasets ranging from aerial cameras to smartphones. It also gives more accurate results compared to the state-of-the-art on the task of monocular human mocap with a static camera.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2023.3264743