Gaze Estimation using Transformer

Recent work has proven the effectiveness of transformers in many computer vision tasks. However, the performance of transformers in gaze estimation is still unexplored. In this paper, we employ transformers and assess their effectiveness for gaze estimation. We consider two forms of vision transform...

Full description

Saved in:

Bibliographic Details
Published in	International Conference on Pattern Recognition pp. 3341 - 3347
Main Authors	Cheng, Yihua, Lu, Feng
Format	Conference Proceeding
Language	English
Published	IEEE 21.08.2022
Subjects	Benchmark testing Computer vision Convolutional codes Costs Distance measurement Estimation Transformers
Online Access	Get full text
ISSN	2831-7475
DOI	10.1109/ICPR56361.2022.9956687

Cover

Loading…

More Information
Summary:	Recent work has proven the effectiveness of transformers in many computer vision tasks. However, the performance of transformers in gaze estimation is still unexplored. In this paper, we employ transformers and assess their effectiveness for gaze estimation. We consider two forms of vision transformer which are pure transformers and hybrid transformers. We first follow the popular ViT and employ a pure transformer to estimate gaze from images. On the other hand, we preserve the convolutional layers and integrate CNNs as well as transformers. The transformer serves as a component to complement CNNs. We compare the performance of the two transformers in gaze estimation. The Hybrid transformer significantly outperforms the pure transformer in all evaluation datasets with fewer parameters. We further conduct experiments to assess the effectiveness of the hybrid transformer and explore the advantage of the self-attention mechanism. Experiments show the hybrid transformer can achieve state-of-the-art performance in all benchmarks with pre-training. To facilitate further research, we release codes and models in https://github.com/yihuacheng/GazeTR.
ISSN:	2831-7475
DOI:	10.1109/ICPR56361.2022.9956687