Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios
Proceedings of Machine Learning Research, 2022 Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In t...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
07.03.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Proceedings of Machine Learning Research, 2022 Human behavior forecasting during human-human interactions is of utmost
importance to provide robotic or virtual agents with social intelligence. This
problem is especially challenging for scenarios that are highly driven by
interpersonal dynamics. In this work, we present the first systematic
comparison of state-of-the-art approaches for behavior forecasting. To do so,
we leverage whole-body annotations (face, body, and hands) from the very
recently released UDIVA v0.5, which features face-to-face dyadic interactions.
Our best attention-based approaches achieve state-of-the-art performance in
UDIVA v0.5. We show that by autoregressively predicting the future with methods
trained for the short-term future (<400ms), we outperform the baselines even
for a considerably longer-term future (up to 2s). We also show that this
finding holds when highly noisy annotations are used, which opens new horizons
towards the use of weakly-supervised learning. Combined with large-scale
datasets, this may help boost the advances in this field. |
---|---|
DOI: | 10.48550/arxiv.2203.03245 |