Exploring Deep Fusion Ensembling for Automatic Visual Interestingness Prediction

In the context of the ever growing quantity of multimedia content from social, news and educational platforms, generating meaningful recommendations and ratings now requires a more advanced understanding of their impact on the user, such as their subjective perception. One of the important subjectiv...

Full description

Saved in:
Bibliographic Details
Published inHuman Perception of Visual Information pp. 33 - 58
Main Authors Constantin, Mihai Gabriel, Ştefan, Liviu-Daniel, Ionescu, Bogdan
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2021
Springer International Publishing
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the context of the ever growing quantity of multimedia content from social, news and educational platforms, generating meaningful recommendations and ratings now requires a more advanced understanding of their impact on the user, such as their subjective perception. One of the important subjective concepts explored by researchers is visual interestingness. While several definitions of this concept are given in the current literature, in a broader sense, this property attempts to measure the ability of audio-visual data to capture and keep the viewer’s attention for longer periods of time. While many computer vision and machine learning methods have been tested for predicting media interestingness, overall, due to the heavily subjective nature of interestingness, the precision of the results is relatively low. In this chapter, we investigate several methods that address this problem from a different angle. We first review the literature on interestingness prediction and present an overview of the traditional fusion mechanisms, such as statistical fusion, weighted approaches, boosting, random forests or randomized trees. Further, we explore the possibility of employing a stronger, novel deep learning-based, system fusion for enhancing the performance. We investigate several types of deep networks for creating the fusion systems, including dense, attention, convolutional and cross-space-fusion networks, while also proposing some input decoration methods that help these networks achieve optimal performance. We present the results, as well as an analysis of the correlation between network structure and overall system performance. Experimental validation is carried out on a publicly available data set and on the systems benchmarked during the 2017 MediaEval Predicting Media Interestingness task.
ISBN:9783030814649
3030814645
DOI:10.1007/978-3-030-81465-6_2