Speech-based recognition of self-reported and observed emotion in a dimensional space

► Exploration of the use of self-reported emotion ratings for automatic affect recognition. ► Better recognition performance is obtained with observed emotion ratings than self-reported ratings. ► Averaging emotion ratings from multiple annotators improves performance. ► Valence is better recognized...

Full description

Saved in:

Bibliographic Details
Published in	Speech communication Vol. 54; no. 9; pp. 1049 - 1063
Main Authors	Truong, Khiet P., van Leeuwen, David A., de Jong, Franciska M.G.
Format	Journal Article
Language	English
Published	Elsevier B.V 01.11.2012
Subjects	Affective computing Arousal Audiovisual database Automatic emotion recognition Emotion annotation Emotion database Emotion elicitation Emotion perception Emotional speech Emotions Mathematical analysis Mathematical models Observers Ratings Recognition Regression Support Vector Regression Videogames Affective computing Emotion database Emotion perception Emotion annotation Support Vector Regression Automatic emotion recognition Emotion elicitation Emotional speech Audiovisual database Videogames
Online Access	Get full text

Cover

Loading…

More Information
Summary:	► Exploration of the use of self-reported emotion ratings for automatic affect recognition. ► Better recognition performance is obtained with observed emotion ratings than self-reported ratings. ► Averaging emotion ratings from multiple annotators improves performance. ► Valence is better recognized with lexical than acoustic features. The differences between self-reported and observed emotion have only marginally been investigated in the context of speech-based automatic emotion recognition. We address this issue by comparing self-reported emotion ratings to observed emotion ratings and look at how differences between these two types of ratings affect the development and performance of automatic emotion recognizers developed with these ratings. A dimensional approach to emotion modeling is adopted: the ratings are based on continuous arousal and valence scales. We describe the TNO-Gaming Corpus that contains spontaneous vocal and facial expressions elicited via a multiplayer videogame and that includes emotion annotations obtained via self-report and observation by outside observers. Comparisons show that there are discrepancies between self-reported and observed emotion ratings which are also reflected in the performance of the emotion recognizers developed. Using Support Vector Regression in combination with acoustic and textual features, recognizers of arousal and valence are developed that can predict points in a 2-dimensional arousal-valence space. The results of these recognizers show that the self-reported emotion is much harder to recognize than the observed emotion, and that averaging ratings from multiple observers improves performance.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2012.04.006