Semisupervised Autoencoders for Speech Emotion Recognition

Despite the widespread use of supervised learning methods for speech emotion recognition, they are severely restricted due to the lack of sufficient amount of labelled speech data for the training. Considering the wide availability of unlabelled speech data, therefore, this paper proposes semisuperv...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 26; no. 1; pp. 31 - 43
Main Authors	Jun Deng, Xinzhou Xu, Zixing Zhang, Fruhholz, Sascha, Schuller, Bjorn
Format	Journal Article
Language	English
Published	IEEE 01.01.2018
Subjects	Autoencoders Emotion recognition semi-supervised learning Semisupervised learning Speech speech emotion recognition Speech processing Speech recognition Supervised learning Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Despite the widespread use of supervised learning methods for speech emotion recognition, they are severely restricted due to the lack of sufficient amount of labelled speech data for the training. Considering the wide availability of unlabelled speech data, therefore, this paper proposes semisupervised autoencoders to improve speech emotion recognition. The aim is to reap the benefit from the combination of the labelled data and unlabelled data. The proposed model extends a popular unsupervised autoencoder by carefully adjoining a supervised learning objective. We extensively evaluate the proposed model on the INTERSPEECH 2009 Emotion Challenge database and other four public databases in different scenarios. Experimental results demonstrate that the proposed model achieves state-of-the-art performance with a very small number of labelled data on the challenge task and other tasks, and significantly outperforms other alternative methods.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2017.2759338