Decoupled Multi-perspective Fusion for Speech Depression Detection

S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for extracting interpretable acoustic features based on clinical manifestations. In addition, effectively fusing these features to overcome indiv...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on affective computing pp. 1 - 15
Main Authors Zhao, Minghui, Gao, Hongxiang, Zhao, Lulu, Wang, Zhongyu, Wang, Fei, Zheng, Wenming, Li, Jianqing, Liu, Chengyu
Format Journal Article
LanguageEnglish
Published IEEE 04.02.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for extracting interpretable acoustic features based on clinical manifestations. In addition, effectively fusing these features to overcome individual heterogeneity remains a challenge. This study proposes a decoupled multi-perspective fusion (DMPF) model. The model extracts five key features of voiceprint, emotion, pause, energy, and tremor based on the multi-perspective clinical manifestations. These features are then decoupled into common and private features, which fused through graph attention network to obtain the comprehensive depression representation. Notably, this study has collected a depression speech dataset, which includes standardized and comprehensive tasks along with diagnostic labels provided by psychologists. Extensive subject-independent experiments were conducted on the DAIC-WOZ, MODMA and MPSC datasets. The voiceprint features can automatically cluster the depressed and non-depressed populations. Furthermore, DMPF can effectively fuse common and private features from different perspectives, achieving AUC of 84.20%, 85.34%, 86.13% on three datasets. The results illustrate the interpretability of multi-perspective features and demonstrate that the combination of speech manifestations can enhance the detection ability, which can provide a multi-perspective observational tool for physicians and clinical practice. Code is available at https://github.com/zmh56/SDD-for-DMPF-MPSC .
ISSN:1949-3045
1949-3045
DOI:10.1109/TAFFC.2025.3538519