Recognizing emotion from singing and speaking using shared models

Speech and song are two types of vocal communications that are closely related to each other. While significant progress has been made in both speech and music emotion recognition, few works have concentrated on building a shared emotion recognition model for both speech and song. In this paper, we...

Full description

Saved in:
Bibliographic Details
Published inInternational Conference on Affective Computing and Intelligent Interaction and workshops pp. 139 - 145
Main Authors Biqiao Zhang, Essl, Georg, Provost, Emily Mower
Format Conference Proceeding Journal Article
LanguageEnglish
Published IEEE 01.09.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Speech and song are two types of vocal communications that are closely related to each other. While significant progress has been made in both speech and music emotion recognition, few works have concentrated on building a shared emotion recognition model for both speech and song. In this paper, we propose three shared emotion recognition models for speech and song: a simple model, a single-task hierarchical model, and a multi-task hierarchical model. We study the commonalities and differences present in emotion expression across these two communication domains. We compare the performance across different settings, investigate the relationship between evaluator agreement rate and classification accuracy, and analyze the classification performance of individual feature groups. Our results show that the multi-task model classifies emotion more accurately compared to single-task models when the same set of features is used. This suggests that although spoken and sung emotion recognition tasks are different, they are related, and can be considered together. The results demonstrate that utterances with lower agreement rate and emotions with low activation benefit the most from multi-task learning. Visual features appear to be more similar across spoken and sung emotion expression, compared to acoustic features.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Conference-1
ObjectType-Feature-3
content type line 23
SourceType-Conference Papers & Proceedings-2
ISSN:2156-8111
DOI:10.1109/ACII.2015.7344563