Improved Sliding Window Smoothing for Video Temporal Action Segmentation and Recognition

Despite substantial research on human action segmentation in videos to determine the type and timing of activities, the topic is still unresolved because of the dearth of large-scale annotation data in video analysis applications. Supervised video action segmentation employs a number of temporal con...

Full description

Saved in:
Bibliographic Details
Published in2023 China Automation Congress (CAC) pp. 8653 - 8658
Main Authors Li, Ce, Tian, Yihan, Sheng, Longshuai, Chen, Junzhi, Wang, Tian, Wei, Xianlong
Format Conference Proceeding
LanguageEnglish
Published IEEE 17.11.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Despite substantial research on human action segmentation in videos to determine the type and timing of activities, the topic is still unresolved because of the dearth of large-scale annotation data in video analysis applications. Supervised video action segmentation employs a number of temporal convolutional network (TCN) models to address this problem. The process is still difficult, because of the intricate temporal duration division of the movements in the videos. In order to create a soft and flexible video partition, we incorporate an improved sliding window smoothing (ISWS) technique into a TCN baseline model in this study. When screening the target video segmentation sequence, our research method carefully selects three discriminative frames and cleverly integrates them into the adaptive sliding window to more specifically optimize the smoothing effect of the entire prediction sequence. It is particularly worth noting that we implement a doubling penalty mechanism when the window slides to the wrong category position. In order to learn the resultants of effective and ineffective segmentation paths, we designed a new loss function to smooth the candidate frames of the segmentation points in the sliding window using the ISWS scheme. So that our method can increase the receptive field of video segmentation effectively to gain the optimal action segmentation. Experiments on the breakfast, 50salads, and GTEA datasets demonstrate that our method significantly improved the frame accuracy for action segmentation in videos when compared to the state-of-the-art techniques.
ISSN:2688-0938
DOI:10.1109/CAC59555.2023.10450614