Continuous conditional video synthesis by neural processes

Different conditional video synthesis tasks, such as frame interpolation and future frame prediction, are typically addressed individually by task-specific models, despite their shared underlying characteristics. Additionally, most conditional video synthesis models are limited to discrete frame gen...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 259; p. 104387
Main Authors	Ye, Xi, Bilodeau, Guillaume-Alexandre
Format	Journal Article
Language	English
Published	Elsevier Inc 01.09.2025
Subjects	Continuous video synthesis Frame interpolation Frame prediction Implicit neural representation Neural processes Transformers Transformers Continuous video synthesis Implicit neural representation Frame prediction Neural processes Frame interpolation
Online Access	Get full text
ISSN	1077-3142
DOI	10.1016/j.cviu.2025.104387

Cover

Loading…

More Information
Summary:	Different conditional video synthesis tasks, such as frame interpolation and future frame prediction, are typically addressed individually by task-specific models, despite their shared underlying characteristics. Additionally, most conditional video synthesis models are limited to discrete frame generation at specific integer time steps. This paper presents a unified model that tackles both challenges simultaneously. We demonstrate that conditional video synthesis can be formulated as a neural process, where input spatio-temporal coordinates are mapped to target pixel values by conditioning on context spatio-temporal coordinates and pixel values. Our approach leverages a Transformer-based non-autoregressive conditional video synthesis model that takes the implicit neural representation of coordinates and context pixel features as input. Our task-specific models outperform previous methods for future frame prediction and frame interpolation across multiple datasets. Importantly, our model enables temporal continuous video synthesis at arbitrary high frame rates, outperforming the previous state-of-the-art. The source code and video demos for our model are available at https://npvp.github.io. •The first neural process-based unified video synthesis model for multiple tasks.•We achieve temporal continuous frame generation by impolict neural representation.•Achieves superior performance in video frame interpolation and continuous prediction.
ISSN:	1077-3142
DOI:	10.1016/j.cviu.2025.104387