Random Walks for Temporal Action Segmentation with Timestamp Supervision

Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense video annotations which are time and money consuming. Furthermore, the tempora...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) pp. 6600 - 6610
Main Authors	Hirsch, Roy, Cohen, Regev, Golany, Tomer, Freedman, Daniel, Rivlin, Ehud
Format	Conference Proceeding
Language	English
Published	IEEE 03.01.2024
Subjects	Adaptation models Algorithms and algorithms Computer vision formulations Image annotation Machine learning architectures Predictive models Smoothing methods Training Uncertainty Video recognition and understanding
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense video annotations which are time and money consuming. Furthermore, the temporal boundaries between consecutive actions typically are not well-defined, leading to inherent ambiguity and interrater disagreement. A promising approach to remedy these limitations is timestamp supervision, requiring only one labeled frame per action instance in a training video. In this work, we reformulate the task of temporal segmentation as a graph segmentation problem with weakly-labeled vertices. We introduce an efficient segmentation method based on random walks on graphs, obtained by solving a sparse system of linear equations. Furthermore, the proposed technique can be employed in any one or combination of the following forms: (1) as a standalone solution for generating dense pseudo-labels from timestamps; (2) as a training loss; (3) as a smoothing mechanism given intermediate predictions. Extensive experiments with three datasets (50Salads, Breakfast, GTEA) show that our method competes with state-of-the-art, and allows the identification of regions of uncertainty around action boundaries.
ISSN:	2642-9381
DOI:	10.1109/WACV57701.2024.00648