Meta-Learning, Fast Adaptation, and Latent Representation for Head Pose Estimation

Head pose estimation is used in a variety of human-computer interface applications, like stare tracking, driving assistance, impaired assistance, and entertainment. Advances in convolutional neural networks have a considerable improvement in the performance of head pose estimation. However, difficul...

Full description

Saved in:

Bibliographic Details
Published in	2022 31st Conference of Open Innovations Association (FRUCT) Vol. 31; no. 1; pp. 71 - 78
Main Authors	Joshi, Manoj, Pant, Dibakar Raj, Karn, Rupesh Raj, Heikkonen, Jukka, Kanth, Rajeev
Format	Conference Proceeding Journal Article
Language	English
Published	FRUCT Oy 2022 FRUCT
Subjects	Adaptation models deep learning few-shot learning head pose estimation Human computer interaction meta-learning Pose estimation Representation learning Technological innovation Training Training data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Head pose estimation is used in a variety of human-computer interface applications, like stare tracking, driving assistance, impaired assistance, and entertainment. Advances in convolutional neural networks have a considerable improvement in the performance of head pose estimation. However, difficulties in capturing well-labelled head pose data and differences in the facial features of different persons make them difficult to use. This work proposes a meta-learning based technique for head pose estimation problem in BIWI head pose dataset. An approach to learning latent representation of head pose features using variational autoencoder is implemented. Then a fast, adaptable head pose estimator is trained using meta-learning in a few-shot settings. Model agnostic meta-learning (MAML) algorithm has been deployed for training a head pose estimator. Mean Average Error (MAE avg ) of 7.33 is achieved in predicting head pose angles in one-shot settings. After meta-training, the optimized model is used to analyze fast adaptation in a test set that has been separated from the BIWI head pose dataset. We begin with the trained network's optimum parameters and optimize the inner loop for quick adaptation. The optimized model can predict accurate head poses using as few as 10 gradient descent steps in the unseen set of tasks sampled from the test set.
ISSN:	2305-7254 2343-0737
DOI:	10.23919/FRUCT54823.2022.9770932