Fusion-competition framework of local topology and global texture for head pose estimation

•The proposed method combined the heterogeneous data to fully utilizes the texture information of RGB image and the geometric information of point cloud. Compared with depth image, the point cloud has more powerful topology feature, which can be learned with texture feature for accurately and robust...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 149; p. 110285
Main Authors Ma, Dongsheng, Fu, Tianyu, Yang, Yifei, Cao, Kaibin, Fan, Jingfan, Xiao, Deqiang, Song, Hong, Gu, Ying, Yang, Jian
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•The proposed method combined the heterogeneous data to fully utilizes the texture information of RGB image and the geometric information of point cloud. Compared with depth image, the point cloud has more powerful topology feature, which can be learned with texture feature for accurately and robustly head pose estimation.•The proposed framework is constructed to achieve the feature fusion in the texture-topology level and generate the feature competition among the local regions. This fusion-competition framework enhances the expression of the features with different categories in the different levels to decrease the estimation error and increase the stability.•This paper constructed an RGB-Depth dataset using HoloLens2 for training and testing in head pose estimation. This dataset has abundant head pose samples including 24 sessions with 12 K frames from 21 males and 1 female, and the ground truth of pose in each frame is labeled by an accurate tracking device tied on the head. RGB image and point cloud involve texture and geometric structure, which are widely used for head pose estimation. However, images lack of spatial information, and the quality of point cloud is easily affected by sensor noise. In this paper, a novel fusion-competition framework (FCF) is proposed to overcome the limitations of a single modality. The global texture information is extracted from image and the local topology information is extracted from point cloud to project heterogeneous data into a common feature subspace. The projected texture feature weighted by the channel attention mechanism is embedded into each local point cloud region with different topological features for fusion. The scoring mechanism creates competition among the regions involving local-global fused features to predict final pose with the highest score. According to the evaluation results on the public and our constructed datasets, the FCF improves the estimation accuracy and stability by an average of 13.6 % and 12.7 %, which is compared to nine state-of-the-art methods.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110285