Evaluation of Camera Pose Estimation Using Human Head Pose Estimation

We introduce and evaluate a novel camera pose estimation framework that uses the human head as a calibration object. The proposed method facilitates extrinsic calibration from 2D input images (NIR and/or RGB), while merely relying on the detected human head, without the need for depth information. T...

Full description

Saved in:

Bibliographic Details
Published in	SN computer science Vol. 4; no. 3; p. 301
Main Authors	Fischer, Robert, Hödlmoser, Michael, Gelautz, Margrit
Format	Journal Article
Language	English
Published	Singapore Springer Nature Singapore 01.05.2023 Springer Nature B.V
Subjects	Accuracy Advances on Computer Vision Calibration Cameras Computer Imaging Computer Science Computer Systems Organization and Communication Networks Data Structures and Information Theory Datasets Head Imaging and Computer Graphics Theory and Applications Information Systems and Communication Service Neural networks Original Research Pattern Recognition and Graphics Pose estimation Robotics Software Engineering/Programming and Operating Systems Three dimensional models Vision Evaluation Camera networks Extrinsic calibration Camera pose estimation Head pose estimation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We introduce and evaluate a novel camera pose estimation framework that uses the human head as a calibration object. The proposed method facilitates extrinsic calibration from 2D input images (NIR and/or RGB), while merely relying on the detected human head, without the need for depth information. The approach is applicable to single cameras or multi-camera networks. Our implementation uses a fine-tuned deep learning-based 2D human facial landmark detector to estimate the 3D human head pose by fitting a 3D head model to the detected 2D facial landmarks. Our work focuses on an evaluation of the proposed approach on real multi-camera recordings and synthetic renderings to determine the accuracy of the pose estimation results and their applicability. We assess the robustness of our method against different input parameters, such as varying relative camera positions, variations of head models, face occlusions (by masks, sun glasses, etc.), potential biases and variance among humans. Based on the experimental results, we expect our approach to be effective for numerous use cases including automotive attention monitoring, robotics, VR/AR and other scenarios where ease of handling outweighs accuracy.
ISSN:	2661-8907 2662-995X 2661-8907
DOI:	10.1007/s42979-023-01709-0