Object-Centric Video Prediction Via Decoupling of Object Dynamics and Interactions

We present a framework for object-centric video prediction, i.e., parsing a video sequence into objects, and modeling their dynamics and interactions in order to predict the future object states from which video frames are rendered. To facilitate the learning of meaningful spatio-temporal object rep...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE International Conference on Image Processing (ICIP) pp. 570 - 574
Main Authors	Villar-Corrales, Angel, Wahdan, Ismail, Behnke, Sven
Format	Conference Proceeding
Language	English
Published	IEEE 08.10.2023
Subjects	Animation Codes future frame prediction Image segmentation object-centric learning Object-centric video prediction Planning Predictive models scene parsing Transformers Video sequences
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present a framework for object-centric video prediction, i.e., parsing a video sequence into objects, and modeling their dynamics and interactions in order to predict the future object states from which video frames are rendered. To facilitate the learning of meaningful spatio-temporal object representations and forecasting of their states, we propose two novel object-centric video prediction (OCVP) transformer modules, which decouple the processing of temporal dynamics and object interactions. We show how OCVP predictors outperform object-agnostic video prediction models on two different datasets. Furthermore, we observe that OCVP modules learn consistent and interpretable object representations. Animations and code to reproduce our results can be found in our project website 1 .
DOI:	10.1109/ICIP49359.2023.10222810