Collaborative knowledge distillation for incomplete multi-view action prediction

Predicting future actions is a key in visual understanding, surveillance, and human behavior analysis. Current methods for video-based prediction are primarily using single-view data, while in the real world multiple cameras and produced videos are readily available, which may potentially benefit th...

Full description

Saved in:

Bibliographic Details
Published in	Image and vision computing Vol. 107; p. 104111
Main Authors	Kumar, Deepak, Kumar, Chetan, Shao, Ming
Format	Journal Article
Language	English
Published	Elsevier B.V 01.03.2021
Subjects	Action prediction Graph attention Knowledge distillation Multi-view Action prediction Multi-view Knowledge distillation Graph attention
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Predicting future actions is a key in visual understanding, surveillance, and human behavior analysis. Current methods for video-based prediction are primarily using single-view data, while in the real world multiple cameras and produced videos are readily available, which may potentially benefit the action prediction tasks. However, it may bring up a new challenge: subjects in the videos are more likely to be occluded by objects when captured from different angles, or suffer from signal jittering in transmission. To that end, in this paper we propose a novel student network called Collaborative Knowledge Distillation (CKD) to predict human actions with missing information under a multi-view setting, i.e., incomplete multi-view action prediction. First, we create a graph attention based teacher model capable of fusing multi-view video features for prediction task. Second, we construct a corruption pattern bank (CPB) to simulate various missing segments in multi-view video, and each student model will manage one pattern through privileged information and knowledge distillation. Third, to account for arbitrary missing video segments in real-world, the ensemble of student models will be developed to make a joint prediction. The proposed framework has been extensively evaluated on popular multi-view visual action datasets, including PKU-MMD and NTU-RGB to validate the effectiveness of our approach and to the best of our knowledge action prediction has not yet been explored in the multi-view setting. •Multi-view methods assume complete observation and fuse both views•Early event detection should be made on incomplete observation•Missing view or partial view blocked by object can occur any time•Graph Attention + Knowledge Distillation to mitigate the corrupted data•Ensemble learning to predict action from unseen data corruption
ISSN:	0262-8856 1872-8138
DOI:	10.1016/j.imavis.2021.104111