Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet Domain Features

The better a machine realizes non-verbal ways of communication, such as emotion, better levels of human machine interrelation is achieved. This paper describes a method for recognizing emotions from human Speech and visual data for machine to understand. For extraction of features, videos consisting...

Full description

Saved in:

Bibliographic Details
Published in	2017 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON ECE) pp. 233 - 236
Main Authors	Noor, Shamman, Dhrubo, Ehsan Ahmed, Minhaz, Ahmed Tahseen, Shahnaz, Celia, Fattah, Shaikh Anowarul
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2017
Subjects	Correlation Emotion recognition Feature extraction Horizontal and Vertical cross correlation Mel frequency cepstral coefficient Mel Frequency Cepstral Coefficient(MFCC) Perceptual Linear Predictive Coefficient(PLPC) Speech recognition Videos Viola Jones Algorithm Visualization
Online Access	Get full text
DOI	10.1109/WIECON-ECE.2017.8468871

Cover

More Information
Summary:	The better a machine realizes non-verbal ways of communication, such as emotion, better levels of human machine interrelation is achieved. This paper describes a method for recognizing emotions from human Speech and visual data for machine to understand. For extraction of features, videos consisting 6 classes of emotions (Happy, Sad, Fear, Disgust, Angry, and Surprise) of 44 different subjects from eNTERFACE05 database are used. As video feature, Horizontal and Vertical Cross Correlation (HCCR and VCCR) signals, extracted from regions-eye and mouth, are used. As Speech feature, Perceptual Linear Predictive Coefficients (PLPC) and Mel-frequency Cepstral Coefficients (MFCC), extracted from Wavelet Packet Coefficients, are used in conjunction with PLPC and MFCC extracted from original signal. For both types of feature, K-Nearest Neighbour (KNN) multiclass classification method is applied separately for identifying emotions expressed in speech and through facial movement. Emotion expressed in a video file is identified by concatenating the Speech and video features and applying KNN classification method.
DOI:	10.1109/WIECON-ECE.2017.8468871