Human-Centric Spatio-Temporal Video Grounding With Visual Transformers
In this work, we introduce a novel task - Human-centric Spatio-Temporal Video Grounding (HC-STVG). Unlike the existing referring expression tasks in images or videos, by focusing on humans, HC-STVG aims to localize a spatio-temporal tube of the target person from an untrimmed video based on a given...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 32; no. 12; pp. 8238 - 8249 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.12.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!