Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization

The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos. Browsing such long unstructured videos is time-consuming and tedious. This paper studies the discovery of moments of user's major or special interest (i.e.,...

Full description

Saved in:

Bibliographic Details
Published in	2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 982 - 990
Main Authors	Ting Yao, Tao Mei, Yong Rui
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2016
Subjects	Cameras Computer architecture Image segmentation Neural networks Streaming media Three-dimensional displays Visualization
Online Access	Get full text
ISSN	1063-6919
DOI	10.1109/CVPR.2016.112

Cover

Loading…

More Information
Summary:	The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos. Browsing such long unstructured videos is time-consuming and tedious. This paper studies the discovery of moments of user's major or special interest (i.e., highlights) in a video, for generating the summarization of first-person videos. Specifically, we propose a novel pairwise deep ranking model that employs deep learning techniques to learn the relationship between high-light and non-highlight video segments. A two-stream network structure by representing video segments from complementary information on appearance of video frames and temporal dynamics across frames is developed for video highlight detection. Given a long personal video, equipped with the highlight detection model, a highlight score is assigned to each segment. The obtained highlight segments are applied for summarization in two ways: video time-lapse and video skimming. The former plays the highlight (non-highlight) segments at low (high) speed rates, while the latter assembles the sequence of segments with the highest scores. On 100 hours of first-person videos for 15 unique sports categories, our highlight detection achieves the improvement over the state-of-the-art RankSVM method by 10.5% in terms of accuracy. Moreover, our approaches produce video summary with better quality by a user study from 35 human subjects.
ISSN:	1063-6919
DOI:	10.1109/CVPR.2016.112