Random Walks for Temporal Action Segmentation with Timestamp Supervision

Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense video annotations which are time and money consuming. Furthermore, the tempora...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) pp. 6600 - 6610
Main Authors Hirsch, Roy, Cohen, Regev, Golany, Tomer, Freedman, Daniel, Rivlin, Ehud
Format Conference Proceeding
LanguageEnglish
Published IEEE 03.01.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense video annotations which are time and money consuming. Furthermore, the temporal boundaries between consecutive actions typically are not well-defined, leading to inherent ambiguity and interrater disagreement. A promising approach to remedy these limitations is timestamp supervision, requiring only one labeled frame per action instance in a training video. In this work, we reformulate the task of temporal segmentation as a graph segmentation problem with weakly-labeled vertices. We introduce an efficient segmentation method based on random walks on graphs, obtained by solving a sparse system of linear equations. Furthermore, the proposed technique can be employed in any one or combination of the following forms: (1) as a standalone solution for generating dense pseudo-labels from timestamps; (2) as a training loss; (3) as a smoothing mechanism given intermediate predictions. Extensive experiments with three datasets (50Salads, Breakfast, GTEA) show that our method competes with state-of-the-art, and allows the identification of regions of uncertainty around action boundaries.
ISSN:2642-9381
DOI:10.1109/WACV57701.2024.00648