Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video
We address the challenging task of anticipating human-object interaction in first person videos. Most existing methods ignore how the camera wearer interacts with the objects, or simply consider body motion as a separate modality. In contrast, we observe that the international hand movement reveals...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
25.11.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We address the challenging task of anticipating human-object interaction in
first person videos. Most existing methods ignore how the camera wearer
interacts with the objects, or simply consider body motion as a separate
modality. In contrast, we observe that the international hand movement reveals
critical information about the future activity. Motivated by this, we adopt
intentional hand movement as a future representation and propose a novel deep
network that jointly models and predicts the egocentric hand motion,
interaction hotspots and future action. Specifically, we consider the future
hand motion as the motor attention, and model this attention using latent
variables in our deep model. The predicted motor attention is further used to
characterise the discriminative spatial-temporal visual features for predicting
actions and interaction hotspots. We present extensive experiments
demonstrating the benefit of the proposed joint model. Importantly, our model
produces new state-of-the-art results for action anticipation on both EGTEA
Gaze+ and the EPIC-Kitchens datasets. Our project page is available at
https://aptx4869lm.github.io/ForecastingHOI/ |
---|---|
DOI: | 10.48550/arxiv.1911.10967 |