Temporal Pyramid Network With Spatial-Temporal Attention for Pedestrian Trajectory Prediction

Understanding and predicting human motion behavior with social interactions have become an increasingly crucial problem for a vast number of applications, ranging from visual navigation of autonomous vehicles to activity prediction of intelligent video surveillance. Accurately forecasting crowd moti...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on network science and engineering Vol. 9; no. 3; pp. 1006 - 1019
Main Authors Li, Yuanman, Liang, Rongqin, Wei, Wei, Wang, Wei, Zhou, Jiantao, Li, Xia
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.05.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2327-4697
2334-329X
DOI10.1109/TNSE.2021.3065019

Cover

Loading…
More Information
Summary:Understanding and predicting human motion behavior with social interactions have become an increasingly crucial problem for a vast number of applications, ranging from visual navigation of autonomous vehicles to activity prediction of intelligent video surveillance. Accurately forecasting crowd motion behavior is challenging due to the multimodal nature of trajectories and complex social interactions between humans. Recent algorithms model and predict the trajectory with a single resolution, making them difficult to exploit the long-range information and the short-range information of the motion behavior simultaneously. In this paper, we propose a temporal pyramid network for pedestrian trajectory prediction through a squeeze modulation and a dilation modulation. The hierarchical design of our framework allows to model the trajectory with multi-resolution, then can better capture the motion behavior at various tempos. By progressively combining the global context with the local one, we finally construct a coarse-to-fine hierarchical pedestrian trajectory prediction framework with multi-supervision. Further, we introduce a unified spatial-temporal attention mechanism to adaptively select important information of persons around in both spatial and temporal domains. We show that our attention strategy is intuitive and effective to encode the influence of social interactions. Experimental results on two benchmarks demonstrate the superiority of our proposed scheme.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2327-4697
2334-329X
DOI:10.1109/TNSE.2021.3065019