Coherent Deep-Net Fusion To Classify Shots In Concert Videos

Varying types of shots is a fundamental element in the language of film, commonly used by a visual storytelling director. The technique is often used in creating professional recordings of a live concert, but meanwhile may not be appropriately applied in audience recordings of the same event. Such v...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 20; no. 11; pp. 3123 - 3136
Main Authors	Lin, Jen-Chun, Wei, Wen-Li, Liu, Tyng-Luh, Yang, Yi-Hsuan, Wang, Hsin-Min, Tyan, Hsiao-Rong, Liao, Hong-Yuan Mark
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.11.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Bayesian analysis Classification Classifiers convolutional neural networks Feature extraction Head Hierarchies Image classification Image color analysis Image enhancement language of film live concert Live performance Mashups Model accuracy Neural networks Object recognition Shot Statistical analysis Task analysis Types of shots Videos Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Varying types of shots is a fundamental element in the language of film, commonly used by a visual storytelling director. The technique is often used in creating professional recordings of a live concert, but meanwhile may not be appropriately applied in audience recordings of the same event. Such variations could cause the task of classifying shots in concert videos, professional or amateur, very challenging. To achieve more reliable shot classification, we propose a novel probabilistic-based approach, named as coherent classification net (CC-Net), by addressing three crucial issues. First, we focus on learning more effective features by fusing the layer-wise outputs extracted from a deep convolutional neural network (CNN), pretrained on a large-scale data set for object recognition. Second, we introduce a frame-wise classification scheme, the error weighted deep cross-correlation model (EW-Deep-CCM), to boost the classification accuracy. Specifically, the deep neural network-based cross-correlation model (deep-CCM) is constructed to not only model the extracted feature hierarchies of CNN independently, but also relate the statistical dependencies of paired features from different layers. Then, a Bayesian error weighting scheme for a classifier combination is adopted to explore the contributions from individual Deep-CCM classifiers to enhance the accuracy of shot classification in each image frame. Third, we feed the frame-wise classification results to a linear-chain conditional random field module to refine the shot predictions by taking into account the global and temporal regularities. We provide extensive experimental results on a data set of live concert videos to demonstrate the advantage of the proposed CC-Net over existing popular fusion approaches for shot classification.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2018.2820904