PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression

Existing methods on visual emotion analysis mainly focus on coarse-grained emotion classification, i.e. assigning an image with a dominant discrete emotion category. However, these methods cannot well reflect the complexity and subtlety of emotions. In this paper, we study the fine-grained regressio...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Zhao, Sicheng, Zizhou Jia, Chen, Hui, Li, Leida, Ding, Guiguang, Keutzer, Kurt
Format Paper Journal Article
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 11.09.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Existing methods on visual emotion analysis mainly focus on coarse-grained emotion classification, i.e. assigning an image with a dominant discrete emotion category. However, these methods cannot well reflect the complexity and subtlety of emotions. In this paper, we study the fine-grained regression problem of visual emotions based on convolutional neural networks (CNNs). Specifically, we develop a Polarity-consistent Deep Attention Network (PDANet), a novel network architecture that integrates attention into a CNN with an emotion polarity constraint. First, we propose to incorporate both spatial and channel-wise attentions into a CNN for visual emotion regression, which jointly considers the local spatial connectivity patterns along each channel and the interdependency between different channels. Second, we design a novel regression loss, i.e. polarity-consistent regression (PCR) loss, based on the weakly supervised emotion polarity to guide the attention generation. By optimizing the PCR loss, PDANet can generate a polarity preserved attention map and thus improve the emotion regression performance. Extensive experiments are conducted on the IAPS, NAPS, and EMOTIC datasets, and the results demonstrate that the proposed PDANet outperforms the state-of-the-art approaches by a large margin for fine-grained visual emotion regression. Our source code is released at: https://github.com/ZizhouJia/PDANet.
ISSN:2331-8422
DOI:10.48550/arxiv.1909.05693