DCNN and DNN based multi-modal depression recognition

In this paper, we propose an audio visual multimodal depression recognition framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. For each modality, corresponding feature descriptors are input into a DCNN to learn high-level global features with compact...

Full description

Saved in:
Bibliographic Details
Published inInternational Conference on Affective Computing and Intelligent Interaction and workshops pp. 484 - 489
Main Authors Yang, Le, Jiang, Dongmei, Han, Wenjing, Sahli, Hichem
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2017
Subjects
Online AccessGet full text
ISSN2156-8111
DOI10.1109/ACII.2017.8273643

Cover

Loading…
More Information
Summary:In this paper, we propose an audio visual multimodal depression recognition framework composed of deep convolutional neural network (DCNN) and deep neural network (DNN) models. For each modality, corresponding feature descriptors are input into a DCNN to learn high-level global features with compact dynamic information, which are then fed into a DNN to predict the PHQ-8 score. For multi-modal depression recognition, the predicted PHQ-8 scores from each modality are integrated in a DNN for the final prediction. In addition, we propose the Histogram of Displacement Range as a novel global visual descriptor to quantify the range and speed of the facial landmarks' displacements. Experiments have been carried out on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset for the Depression Sub-challenge of the Audio-Visual Emotion Challenge (AVEC 2016), results show that the proposed multi-modal depression recognition framework obtains very promising results on both the development set and test set, which outperforms the state-of-the-art results.
ISSN:2156-8111
DOI:10.1109/ACII.2017.8273643