Relationship between speech entrainment and emotion

Entrainment in spoken dialogue interaction is the tendency of a speaker to adjust some properties of a speaker's speech to match the interlocutor's characteristics. It has been observed in multiple dimensions of spoken interactions, including acoustic-prosodic [1], linguistic style [2], or...

Full description

Saved in:
Bibliographic Details
Published in2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) pp. 1 - 4
Main Author Kejriwal, Jay
Format Conference Proceeding
LanguageEnglish
Published IEEE 18.10.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Entrainment in spoken dialogue interaction is the tendency of a speaker to adjust some properties of a speaker's speech to match the interlocutor's characteristics. It has been observed in multiple dimensions of spoken interactions, including acoustic-prosodic [1], linguistic style [2], or syntactic structure [3]. Similarly, emotional entrainment includes phenomena where speakers adjust their emotional states based on their interlocutor's emotional responses. It is defined as the synchronous convergence of human emotions [4]. There is also evidence of emotional entrainment in everyday life. For example, sharing good news between two best friends will make them all happy, whereas a heated argument between the two will result in a bad temper [4]. Despite its importance, studies on emotional entrainment using speech remain underexplored, and only few studies [4], [5] using textual modality showing the potential for further research. One of the broad goals of my dissertation is to equip machines interacting with humans using speech with emotional intelligence by adding emotional entrainment as a functionality. This functionality will allow machines to adjust their spoken responses dynamically based on their interlocutors' emotional responses. In my current research, I will focus on understanding the underlying patterns of emotional entrainment using speech modality. There are three partial goals in the present study that lead toward the broad goal described above. Studies on emotions are mainly done in an experimental setup and trained actors are employed for inducing artificial emotions [6]. Research on variation in acoustic/prosodic features that occur under different emotions is widely studied on acted speech corpus [7]. However, these speech variations remain less explored for emotions captured in naturalistic dialogue. The first goal is to better understand the relationship between individual emotions captured in naturalistic setting and acoustic/prosodic (a/p) features such as pitch, intensity and speech rate. Researchers have found variable results in speech entrainment of different features or different dimensions even when using the same corpus and applying the same methodology. In attempts to identify the origins of these variations, some authors focused on gender, but the results were mixed [8]. Based on these findings, it is important to understand if entrainment is affected by the emotional states of the dyads. The second goal is to explore speech variation across the pair of dyads. This variation involves both the acoustic/prosodic (a/p) features and their respective emotional states. Research on speech emotion recognition (SER) is considered an advanced field where machines can identify the emotional state of the user. However, machines are unable to moderate their spoken response based on their interlocutors' responses, which sometimes results in monotonous speech and lack of expressive and emotional characteristics. The final goal is to develop a classification model for predicting the emotional state of an interlocutor which will allow machines to adjust their emotional responses dynamically.
DOI:10.1109/ACIIW57231.2022.10086027