Zero-Shot Speech Emotion Recognition Using Generative Learning with Reconstructed Prototypes

Zero-shot Speech Emotion Recognition (SER) enables machines to perceive unseen-emotional speech without knowing any samples from these emotional states, which is helpful in audio-based autonomous affective computing. However, existing works on zero-shot SER directly employ original prototypes and on...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1 - 5
Main Authors	Xu, Xinzhou, Deng, Jun, Zhang, Zixing, Yang, Zhen, Schuller, Bjorn W.
Format	Conference Proceeding
Language	English
Published	IEEE 04.06.2023
Subjects	Acoustics Affective computing Emotion recognition emotional prototypes generative learning Prototypes Semantics Signal processing Speech emotion recognition Speech recognition zero-shot learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Zero-shot Speech Emotion Recognition (SER) enables machines to perceive unseen-emotional speech without knowing any samples from these emotional states, which is helpful in audio-based autonomous affective computing. However, existing works on zero-shot SER directly employ original prototypes and only consider inter-domain knowledge transfer through learning unseen-emotional classifiers. In this regard, we propose a zero-shot SER approach using generative learning with reconstructed prototypes in this paper. Within the proposed approach, we first reconstruct prototypes using the alignment from paralinguistic features to semantic prototypes. Then, generative learning is performed to build the connection from the reconstructed prototypes to the features. Afterwards, zero-shot experiments on emotional-speech data demonstrate that the proposed approach achieves better performance compared with the state-of-the-art approaches.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49357.2023.10094888