Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction

•A novel anisotropic angle distribution learning (AADL) method is proposed for head pose estimation.•For a central pose, head pose image variations are different even increasing the same pose angle in yaw and pitch directions.•A 2D Gaussian-like distribution is defined to fit the anisotropic angle l...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 433; pp. 310 - 322
Main Authors Liu, Hai, Nie, Hanwen, Zhang, Zhaoli, Li, You-Fu
Format Journal Article
LanguageEnglish
Published Elsevier B.V 14.04.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A novel anisotropic angle distribution learning (AADL) method is proposed for head pose estimation.•For a central pose, head pose image variations are different even increasing the same pose angle in yaw and pitch directions.•A 2D Gaussian-like distribution is defined to fit the anisotropic angle labels.•The robustness of the proposed model is verified by extensive experiments. Head pose estimation is an important way to understand human attention in the human-computer interaction. In this paper, we propose a novel anisotropic angle distribution learning (AADL) network for head pose estimation task. Firstly, two key findings are revealed as following: 1) Head pose image variations are different at the yaw and pitch directions with the same pose angle increasing on a fixed central pose; 2) With the fixed angle interval increasing, the image variations increase firstly and then decrease in yaw angle direction. Then, the maximum a posterior technology is employed to construct the head pose estimation network, which includes three parts, such as convolutional layer, covariance pooling layer and output layer. In the output layer, the labels are constructed as the anisotropic angle distributions on the basis of two key findings. And the anisotropic angle distributions are fitted by the 2D Gaussian-like distributions (groundtruth labels). Furthermore, the Kullback-Leibler divergence is selected to measure the predication label and the groundtruth one. The features of head pose images are perceived at the AADL-based convolutional neural network in an end-to-end manner. Experimental results demonstrate that the developed AADL-based labels have several advantages, such as robustness for head pose image missing, insensitivity for the motion blur. Moreover, the proposed method has achieved good performance compared to several state-of-the-art methods on the Pointing’04 and CAS_PEAL_R1 databases.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2020.09.068