Beyond Vision: A Multimodal Recurrent Attention Convolutional Neural Network for Unified Image Aesthetic Prediction Tasks

Over the past few years, image aesthetic prediction has attracted increasing attention because of its wide applications, such as image retrieval, photo album management and aesthetic-driven image enhancement. However, previous studies in this area only achieve limited success because 1) they primari...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 23; pp. 611 - 623
Main Authors Zhang, Xiaodan, Gao, Xinbo, Lu, Wen, He, Lihuo, Li, Jie
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Over the past few years, image aesthetic prediction has attracted increasing attention because of its wide applications, such as image retrieval, photo album management and aesthetic-driven image enhancement. However, previous studies in this area only achieve limited success because 1) they primarily depend on visual features and ignore textual information. 2) they tend to focus equally on to each part of images and ignore the selective attention mechanism. This paper overcomes these limitations by proposing a novel multimodal recurrent attention convolutional neural network (MRACNN). More specifically, the MRACNN consists of two streams: the vision stream and the language stream. The former employs the recurrent attention network to tune out irrelevant information and focuses on some key regions to extract visual features. The latter utilizes the Text-CNN to capture the high-level semantics of user comments. Finally, a multimodal factorized bilinear (MFB) pooling approach is used to achieve effective fusion of textual and visual features. Extensive experiments demonstrate that the proposed MRACNN significantly outperforms state-of-the-art methods for unified aesthetic prediction tasks: (i) aesthetic quality classification; (ii) aesthetic score regression; and (iii) aesthetic score distribution prediction.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2020.2985526