A hierarchical depression detection model based on vocal and emotional cues

Effective and efficient automatic depression diagnosis is a challenging subject in the field of affective computing. Since speech signals provide useful information for diagnosing depression, in this paper, we propose to extract deep speaker recognition (SR) and speech emotion recognition (SER) feat...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 441; pp. 279 - 290
Main Authors Dong, Yizhuo, Yang, Xinyu
Format Journal Article
LanguageEnglish
Published Elsevier B.V 21.06.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Effective and efficient automatic depression diagnosis is a challenging subject in the field of affective computing. Since speech signals provide useful information for diagnosing depression, in this paper, we propose to extract deep speaker recognition (SR) and speech emotion recognition (SER) features using pretrained models, and combine the two deep speech features to take advantage of the complementary information between the vocal and emotional differences of speakers. In addition, due to the small amount of data for depression recognition and the cost sensitivity of the diagnosis results, we propose a hierarchical depression detection model, in which multiple classifiers are set up prior to a regressor to guide the prediction of depression severity. We test our method on the AVEC 2013 and AVEC 2014 benchmark databases. The results demonstrate that the fusion of deep SR and SER features can improve the prediction performance of the model. The proposed method, using only audio features, can avoid the overfitting problem and achieves better performance than the previous audio-based methods on both databases. It also provides results comparable to those of video-based and multimodal-based methods for depression detection.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2021.02.019