Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet Domain Features

The better a machine realizes non-verbal ways of communication, such as emotion, better levels of human machine interrelation is achieved. This paper describes a method for recognizing emotions from human Speech and visual data for machine to understand. For extraction of features, videos consisting...

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON ECE) pp. 233 - 236
Main Authors Noor, Shamman, Dhrubo, Ehsan Ahmed, Minhaz, Ahmed Tahseen, Shahnaz, Celia, Fattah, Shaikh Anowarul
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2017
Subjects
Online AccessGet full text
DOI10.1109/WIECON-ECE.2017.8468871

Cover

More Information
Summary:The better a machine realizes non-verbal ways of communication, such as emotion, better levels of human machine interrelation is achieved. This paper describes a method for recognizing emotions from human Speech and visual data for machine to understand. For extraction of features, videos consisting 6 classes of emotions (Happy, Sad, Fear, Disgust, Angry, and Surprise) of 44 different subjects from eNTERFACE05 database are used. As video feature, Horizontal and Vertical Cross Correlation (HCCR and VCCR) signals, extracted from regions-eye and mouth, are used. As Speech feature, Perceptual Linear Predictive Coefficients (PLPC) and Mel-frequency Cepstral Coefficients (MFCC), extracted from Wavelet Packet Coefficients, are used in conjunction with PLPC and MFCC extracted from original signal. For both types of feature, K-Nearest Neighbour (KNN) multiclass classification method is applied separately for identifying emotions expressed in speech and through facial movement. Emotion expressed in a video file is identified by concatenating the Speech and video features and applying KNN classification method.
DOI:10.1109/WIECON-ECE.2017.8468871