Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views
As wearable cameras become more popular, an important question emerges: how to identify camera wearers within the perspective of conventional static cameras. The drastic difference between first-person (egocentric) and third-person (exocentric) camera views makes this a challenging task. We present...
Saved in:
Published in | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 16477 - 16487 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
16.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | As wearable cameras become more popular, an important question emerges: how to identify camera wearers within the perspective of conventional static cameras. The drastic difference between first-person (egocentric) and third-person (exocentric) camera views makes this a challenging task. We present PersonEnvironmentNet (PEN), a framework designed to integrate information from both the individuals in the two views and geometric cues inferred from the background environment. To facilitate research in this direction, we also present TF2023, a novel dataset comprising synchronized first-person and third-person views, along with masks of camera wearers and labels associating these masks with the respective first-person views. In addition, we propose a novel quantitative metric designed to measure a model's ability to comprehend the relationship between the two views. Our experiments re- veal that PEN outperforms existing methods. The code and dataset are available at https://github.com/ziweizhao1993/PEN. |
---|---|
ISSN: | 2575-7075 |
DOI: | 10.1109/CVPR52733.2024.01559 |