First person video summarization using different graph representations

•Graph based video shot boundary detection using mutual information.•Graph based centrality measure to select a representative frame within a shot.•A center-surround model from spectral dissimilarity of two graphs.•MST based clustering in video similarity graph with new edge inadmissibility measure....

Full description

Saved in:
Bibliographic Details
Published inPattern recognition letters Vol. 146; pp. 185 - 192
Main Authors Sahu, Abhimanyu, Chowdhury, Ananda S.
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 01.06.2021
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Graph based video shot boundary detection using mutual information.•Graph based centrality measure to select a representative frame within a shot.•A center-surround model from spectral dissimilarity of two graphs.•MST based clustering in video similarity graph with new edge inadmissibility measure. [Display omitted] First-person video summarization has emerged as an important research problem for computer vision and multimedia communities. In this paper, we show how different graph representations can be developed for accurately summarizing first-person (egocentric) videos in a computationally efficient manner. Each frame in a video is first represented as a weighted graph. A shot boundary detection method using graph based mutual information is developed. We next construct a weighted graph for each shot. A representative frame from each shot is selected using a graph centrality measure. A new way of characterizing egocentric video frames using a graph based center-surround model is shown next. Here, each representative frame is modeled as a union of a center region (graph) and a surround region (graph). By exploiting spectral measures of dissimilarity between the two (center and surround) graphs, optimal center and surround regions are determined. Optimal regions for all frames within a shot are kept the same as that of the representative frame. Center-surround differences in entropy and optical flow values along with PHOG (Pyramidal HOG) features are extracted from each frame. All frames in a video are finally represented by another weighted graph, termed as a Video Similarity Graph (VSG). The frames are clustered by applying a Minimum Spanning Tree (MST) based approach with a new measure for inadmissible edges. Frames closest to the centroid of each cluster are captured to build the summary. Experimental evaluation on two benchmark datasets indicate the advantage of the proposed formulation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2021.03.013