SmartMocap: Joint Estimation of Human and Camera Motion Using Uncalibrated RGB Cameras

Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem. Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system. The calibration step has to be done a priori for every...

Full description

Saved in:
Bibliographic Details
Published inIEEE robotics and automation letters Vol. 8; no. 6; pp. 3206 - 3213
Main Authors Saini, Nitin, Huang, Chun-Hao P., Black, Michael J., Ahmad, Aamir
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.06.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem. Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system. The calibration step has to be done a priori for every capture session, which is a tedious process, and re-calibration is required whenever cameras are intentionally or accidentally moved. In this letter, we propose a mocap method which uses multiple static and moving extrinsically uncalibrated RGB cameras. The key components of our method are as follows. First, since the cameras and the subject can move freely, we select the ground plane as a common reference to represent both the body and the camera motions unlike existing methods which represent bodies in the camera coordinate system. Second, we learn a probability distribution of short human motion sequences (<inline-formula><tex-math notation="LaTeX">\sim</tex-math></inline-formula>1 sec) relative to the ground plane and leverage it to disambiguate between the camera and human motion. Third, we use this distribution as a motion prior in a novel multi-stage optimization approach to fit the SMPL human body model and the camera poses to the human body keypoints on the images. Finally, we show that our method can work on a variety of datasets ranging from aerial cameras to smartphones. It also gives more accurate results compared to the state-of-the-art on the task of monocular human mocap with a static camera.
ISSN:2377-3766
2377-3766
DOI:10.1109/LRA.2023.3264743