Automatic discrimination between laughter and speech

Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the dev...

Full description

Saved in:

Bibliographic Details
Published in	Speech communication Vol. 49; no. 2; pp. 144 - 158
Main Authors	Truong, Khiet P., van Leeuwen, David A.
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.02.2007 Elsevier Elsevier : North-Holland
Subjects	Applied sciences Artificial intelligence Automatic detection emotion Automatic detection laughter Computer science; control theory; systems Connectionism. Neural networks Exact sciences and technology Information, signal and communications theory Signal and communications theory Signal processing Signal representation. Spectral analysis Signal, noise Speech processing Telecommunications and information theory Automatic detection emotion Automatic detection laughter Performance evaluation Emotion recognition Discriminant analysis Automatic classification Acoustic measurement Acoustic signal Sex Error rate Mixture theory Support vector machine Speaker recognition Signal classification Intonation Discrimination Gaussian process Prosody Automatic measurement Multilayer perceptrons Automatic recognition Pitch(acoustics) Speech processing Language recognition Physical Sciences
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the development of a gender-independent laugh detector with the aim to enable automatic emotion recognition. Different types of features (spectral, prosodic) for laughter detection were investigated using different classification techniques (Gaussian Mixture Models, Support Vector Machines, Multi Layer Perceptron) often used in language and speaker recognition. Classification experiments were carried out with short pre-segmented speech and laughter segments extracted from the ICSI Meeting Recorder Corpus (with a mean duration of approximately 2 s). Equal error rates of around 3% were obtained when tested on speaker-independent speech data. We found that a fusion between classifiers based on Gaussian Mixture Models and classifiers based on Support Vector Machines increases discriminative power. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between laughter and speech. Our acoustic measurements showed differences between laughter and speech in mean pitch and in the ratio of the durations of unvoiced to voiced portions, which indicate that these prosodic features are indeed useful for discrimination between laughter and speech.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2007.01.001