Sentiment analysis using image-based deep spectrum features

We test the suitability of our novel deep spectrum feature representation for performing speech-based sentiment analysis. Deep spectrum features are formed by passing spectrograms through a pre-trained image convolutional neural network (CNN) and have been shown to capture useful emotion information...

Full description

Saved in:

Bibliographic Details
Published in	2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) pp. 26 - 29
Main Authors	Amiriparian, Shahin, Cummins, Nicholas, Ottl, Sandra, Gerczuk, Maurice, Schuller, Bjorn
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2017
Subjects	Feature extraction Mel frequency cepstral coefficient Motion pictures Spectrogram Speech Videos YouTube
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We test the suitability of our novel deep spectrum feature representation for performing speech-based sentiment analysis. Deep spectrum features are formed by passing spectrograms through a pre-trained image convolutional neural network (CNN) and have been shown to capture useful emotion information in speech; however, their usefulness for sentiment analysis is yet to be investigated. Using a data set of movie reviews collected from YouTube, we compare deep spectrum features combined with the bag-of-audio-words (BoAW) paradigm with a state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) based BoAW system when performing a binary sentiment classification task. Key results presented indicate the suitability of both features for the proposed task. The deep spectrum features achieve an unweighted average recall of 74.5 %. The results provide further evidence for the effectiveness of deep spectrum features as a robust feature representation for speech analysis.
DOI:	10.1109/ACIIW.2017.8272618