First person video summarization using different graph representations

•Graph based video shot boundary detection using mutual information.•Graph based centrality measure to select a representative frame within a shot.•A center-surround model from spectral dissimilarity of two graphs.•MST based clustering in video similarity graph with new edge inadmissibility measure....

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition letters Vol. 146; pp. 185 - 192
Main Authors	Sahu, Abhimanyu, Chowdhury, Ananda S.
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.06.2021 Elsevier Science Ltd
Subjects	Center-surround model Centroids Computer vision Edge inadmissibility measure Entropy Feature extraction First-person video Frames (data processing) Graph representations Graph theory Graphical representations Multimedia Optical flow (image analysis) Spectral graph dissimilarity Video data Video similarity graph First-person video Edge inadmissibility measure Video similarity graph Center-surround model Spectral graph dissimilarity
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Graph based video shot boundary detection using mutual information.•Graph based centrality measure to select a representative frame within a shot.•A center-surround model from spectral dissimilarity of two graphs.•MST based clustering in video similarity graph with new edge inadmissibility measure. [Display omitted] First-person video summarization has emerged as an important research problem for computer vision and multimedia communities. In this paper, we show how different graph representations can be developed for accurately summarizing first-person (egocentric) videos in a computationally efficient manner. Each frame in a video is first represented as a weighted graph. A shot boundary detection method using graph based mutual information is developed. We next construct a weighted graph for each shot. A representative frame from each shot is selected using a graph centrality measure. A new way of characterizing egocentric video frames using a graph based center-surround model is shown next. Here, each representative frame is modeled as a union of a center region (graph) and a surround region (graph). By exploiting spectral measures of dissimilarity between the two (center and surround) graphs, optimal center and surround regions are determined. Optimal regions for all frames within a shot are kept the same as that of the representative frame. Center-surround differences in entropy and optical flow values along with PHOG (Pyramidal HOG) features are extracted from each frame. All frames in a video are finally represented by another weighted graph, termed as a Video Similarity Graph (VSG). The frames are clustered by applying a Minimum Spanning Tree (MST) based approach with a new measure for inadmissible edges. Frames closest to the centroid of each cluster are captured to build the summary. Experimental evaluation on two benchmark datasets indicate the advantage of the proposed formulation.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2021.03.013