Speech Emotion Analysis: Exploring the Role of Context

Automated analysis of human affective behavior has attracted increasing attention in recent years. With the research shift toward spontaneous behavior, many challenges have come to surface ranging from database collection strategies to the use of new feature sets (e.g., lexical cues apart from proso...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on multimedia Vol. 12; no. 6; pp. 502 - 509
Main Authors	Tawari, Ashish, Trivedi, Mohan Manubhai
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.10.2010 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Affect analysis affective computing Audiovisual Channels Collection Context context analysis emotion intelligence Emotion recognition emotional speech Emotions Feature extraction Human Recognition Speech Speech recognition Studies vocal expression
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Automated analysis of human affective behavior has attracted increasing attention in recent years. With the research shift toward spontaneous behavior, many challenges have come to surface ranging from database collection strategies to the use of new feature sets (e.g., lexical cues apart from prosodic features). Use of contextual information, however, is rarely addressed in the field of affect expression recognition, yet it is evident that affect recognition by human is largely influenced by the context information. Our contribution in this paper is threefold. First, we introduce a novel set of features based on cepstrum analysis of pitch and intensity contours. We evaluate the usefulness of these features on two different databases: Berlin Database of emotional speech (EMO-DB) and locally collected audiovisual database in car settings (CVRRCar-AVDB). The overall recognition accuracy achieved for seven emotions in the EMO-DB database is over 84% and over 87% for three emotion classes in CVRRCar-AVDB. This is based on tenfold stratified cross validation. Second, we introduce the collection of a new audiovisual database in an automobile setting (CVRRCar-AVDB). In this current study, we only use the audio channel of the database. Third, we systematically analyze the effects of different contexts on two different databases. We present context analysis of subject and text based on speaker/text-dependent/-independent analysis on EMO-DB. Furthermore, we perform context analysis based on gender information on EMO-DB and CVRRCar-AVDB. The results based on these analyses are promising.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2010.2058095