DCNN and DNN based multi-modal depression recognition

In this paper, we propose an audio visual multimodal depression recognition framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. For each modality, corresponding feature descriptors are input into a DCNN to learn high-level global features with compact...

Full description

Saved in:

Bibliographic Details
Published in	International Conference on Affective Computing and Intelligent Interaction and workshops pp. 484 - 489
Main Authors	Yang, Le, Jiang, Dongmei, Han, Wenjing, Sahli, Hichem
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2017
Subjects	Electronic mail Estimation Feature extraction Histograms Speech Training Visualization
Online Access	Get full text
ISSN	2156-8111
DOI	10.1109/ACII.2017.8273643

Cover

Loading…

More Information
Summary:	In this paper, we propose an audio visual multimodal depression recognition framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. For each modality, corresponding feature descriptors are input into a DCNN to learn high-level global features with compact dynamic information, which are then fed into a DNN to predict the PHQ-8 score. For multi-modal depression recognition, the predicted PHQ-8 scores from each modality are integrated in a DNN for the final prediction. In addition, we propose the Histogram of Displacement Range as a novel global visual descriptor to quantify the range and speed of the facial landmarks' displacements. Experiments have been carried out on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset for the Depression Sub-challenge of the Audio-Visual Emotion Challenge (AVEC 2016), results show that the proposed multi-modal depression recognition framework obtains very promising results on both the development set and test set, which outperforms the state-of-the-art results.
ISSN:	2156-8111
DOI:	10.1109/ACII.2017.8273643