KFC: An Efficient Framework for Semi-Supervised Temporal Action Localization
In temporal action localization (TAL), semi-supervised learning is a promising technique to mitigate the cost of precise boundary annotations. Semi-supervised approaches employing consistency regularization (CR), encouraging models to be robust to the perturbed inputs, have achieved great success in...
Saved in:
Published in | IEEE transactions on image processing Vol. 30; pp. 6869 - 6878 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In temporal action localization (TAL), semi-supervised learning is a promising technique to mitigate the cost of precise boundary annotations. Semi-supervised approaches employing consistency regularization (CR), encouraging models to be robust to the perturbed inputs, have achieved great success in image classification problems. The success of CR is largely depended on the perturbations, where instances are perturbed to train a robust model without altering their semantic information. However, the perturbations for image or video classification tasks are not fit to apply to TAL. Since videos in TAL are too long to train the model with raw videos in an end-to-end manner. In this paper, we devise a method named K-farthest crossover to construct perturbations based on video features and apply it to TAL. Motivated by the observation that features in the same action instance become more and more similar during the training process while those in different action instances or backgrounds become more and more divergent, we add perturbations to each feature along temporal axis and adopt CR to encourage the model to retain this observation. Specifically, for a feature, we first find the top-k dissimilar features and average them to form a perturbation. Then, similar to chromosomal crossover, we select a large part of the feature and a small part of the perturbation to recombine a perturbed feature, which preserves the feature semantics yet enough discrepancy. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1057-7149 1941-0042 |
DOI: | 10.1109/TIP.2021.3099407 |