K-centered Patch Sampling for Efficient Video Recognition

For decades, it has been a common practice to choose a subset of video frames for reducing the computational burden of a video understanding model. In this paper, we argue that this popular heuristic might be sub-optimal under recent transformer-based models. Specifically, inspired by that transform...

Full description

Saved in:

Bibliographic Details
Published in	Computer Vision - ECCV 2022 Vol. 13695; pp. 160 - 176
Main Authors	Park, Seong Hyeon, Tack, Jihoon, Heo, Byeongho, Ha, Jung-Woo, Shin, Jinwoo
Format	Book Chapter
Language	English
Published	Switzerland Springer 2022 Springer Nature Switzerland
Series	Lecture Notes in Computer Science
Subjects	center search Efficient video recognition Farthest point sampling Patch sampling Video transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	For decades, it has been a common practice to choose a subset of video frames for reducing the computational burden of a video understanding model. In this paper, we argue that this popular heuristic might be sub-optimal under recent transformer-based models. Specifically, inspired by that transformers are built upon patches of video frames, we propose to sample patches rather than frames using the greedy K-center search, i.e., the farthest patch to what has been chosen so far is sampled iteratively. We then show that a transformer trained with the selected video patches can outperform its baseline trained with the video frames sampled in the traditional way. Furthermore, by adding a certain spatiotemporal structuredness condition, the proposed K-centered patch sampling can be even applied to the recent sophisticated video transformers, boosting their performance further. We demonstrate the superiority of our method on Something–Something and Kinetics datasets.
Bibliography:	Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-19833-5_10.
ISBN:	9783031198328 3031198328
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-031-19833-5_10