Decoupled Multi-perspective Fusion for Speech Depression Detection

S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for extracting interpretable acoustic features based on clinical manifestations. In addition, effectively fusing these features to overcome indiv...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on affective computing pp. 1 - 15
Main Authors Zhao, Minghui, Gao, Hongxiang, Zhao, Lulu, Wang, Zhongyu, Wang, Fei, Zheng, Wenming, Li, Jianqing, Liu, Chengyu
Format Journal Article
LanguageEnglish
Published IEEE 04.02.2025
Subjects
Online AccessGet full text

Cover

Loading…
Abstract S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for extracting interpretable acoustic features based on clinical manifestations. In addition, effectively fusing these features to overcome individual heterogeneity remains a challenge. This study proposes a decoupled multi-perspective fusion (DMPF) model. The model extracts five key features of voiceprint, emotion, pause, energy, and tremor based on the multi-perspective clinical manifestations. These features are then decoupled into common and private features, which fused through graph attention network to obtain the comprehensive depression representation. Notably, this study has collected a depression speech dataset, which includes standardized and comprehensive tasks along with diagnostic labels provided by psychologists. Extensive subject-independent experiments were conducted on the DAIC-WOZ, MODMA and MPSC datasets. The voiceprint features can automatically cluster the depressed and non-depressed populations. Furthermore, DMPF can effectively fuse common and private features from different perspectives, achieving AUC of 84.20%, 85.34%, 86.13% on three datasets. The results illustrate the interpretability of multi-perspective features and demonstrate that the combination of speech manifestations can enhance the detection ability, which can provide a multi-perspective observational tool for physicians and clinical practice. Code is available at https://github.com/zmh56/SDD-for-DMPF-MPSC .
AbstractList S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for extracting interpretable acoustic features based on clinical manifestations. In addition, effectively fusing these features to overcome individual heterogeneity remains a challenge. This study proposes a decoupled multi-perspective fusion (DMPF) model. The model extracts five key features of voiceprint, emotion, pause, energy, and tremor based on the multi-perspective clinical manifestations. These features are then decoupled into common and private features, which fused through graph attention network to obtain the comprehensive depression representation. Notably, this study has collected a depression speech dataset, which includes standardized and comprehensive tasks along with diagnostic labels provided by psychologists. Extensive subject-independent experiments were conducted on the DAIC-WOZ, MODMA and MPSC datasets. The voiceprint features can automatically cluster the depressed and non-depressed populations. Furthermore, DMPF can effectively fuse common and private features from different perspectives, achieving AUC of 84.20%, 85.34%, 86.13% on three datasets. The results illustrate the interpretability of multi-perspective features and demonstrate that the combination of speech manifestations can enhance the detection ability, which can provide a multi-perspective observational tool for physicians and clinical practice. Code is available at https://github.com/zmh56/SDD-for-DMPF-MPSC .
Author Zhao, Lulu
Wang, Fei
Wang, Zhongyu
Zhao, Minghui
Li, Jianqing
Liu, Chengyu
Gao, Hongxiang
Zheng, Wenming
Author_xml – sequence: 1
  givenname: Minghui
  surname: Zhao
  fullname: Zhao, Minghui
  organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China
– sequence: 2
  givenname: Hongxiang
  orcidid: 0000-0003-4121-0250
  surname: Gao
  fullname: Gao, Hongxiang
  organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China
– sequence: 3
  givenname: Lulu
  orcidid: 0000-0001-5183-8741
  surname: Zhao
  fullname: Zhao, Lulu
  organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China
– sequence: 4
  givenname: Zhongyu
  surname: Wang
  fullname: Wang, Zhongyu
  organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China
– sequence: 5
  givenname: Fei
  surname: Wang
  fullname: Wang, Fei
  organization: Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China
– sequence: 6
  givenname: Wenming
  orcidid: 0000-0002-7764-5179
  surname: Zheng
  fullname: Zheng, Wenming
  organization: Key Laboratory of Child Development and Learning Science (Ministry of Education), School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
– sequence: 7
  givenname: Jianqing
  orcidid: 0000-0002-3524-8933
  surname: Li
  fullname: Li, Jianqing
  organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China
– sequence: 8
  givenname: Chengyu
  orcidid: 0000-0003-1965-3020
  surname: Liu
  fullname: Liu, Chengyu
  organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China
BookMark eNpNkMtOwzAQRS1UJErpDyAW-YGEsSd27GVpSEEqYkH2UXDGIqgkkZ0g8ff0tehs7mg05y7OLZt1fUeM3XNIOAfzWK6KYp0IEDJBiVpyc8Xm3KQmRkjl7GK_YcsQvmE_iKhENmdPOdl-GnbURG_TbmzjgXwYyI7tL0XFFNq-i1zvo4-ByH5FOQ2ewvGa03h467s7du3qXaDlOResLJ7L9Uu8fd-8rlfb2KpUxhYEKNLi06BD7sgKstq4praSO2MU6KxGhQps1vC0BiBdA3EChU6DTnHBxKnW-j4ET64afPtT-7-KQ3XwUB09VAcP1dnDHno4QS0RXQA6E1pI_Afo_lst
CODEN ITACBQ
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TAFFC.2025.3538519
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1949-3045
EndPage 15
ExternalDocumentID 10_1109_TAFFC_2025_3538519
10872825
Genre orig-research
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABJNI
ABQJQ
ABVLG
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
PQQKQ
RIA
RIE
5VS
AAYXX
AGSQL
CITATION
EJD
RIG
RNI
RZB
ID FETCH-LOGICAL-c645-c0206e82b93f31fec2ec89fdac51f996087a36360c7d14a00e8a0e1e063f80843
IEDL.DBID RIE
ISSN 1949-3045
IngestDate Tue Jul 01 02:57:56 EDT 2025
Wed Aug 27 01:53:39 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c645-c0206e82b93f31fec2ec89fdac51f996087a36360c7d14a00e8a0e1e063f80843
ORCID 0000-0002-3524-8933
0000-0001-5183-8741
0000-0003-4121-0250
0000-0003-1965-3020
0000-0002-7764-5179
PageCount 15
ParticipantIDs crossref_primary_10_1109_TAFFC_2025_3538519
ieee_primary_10872825
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20250204
PublicationDateYYYYMMDD 2025-02-04
PublicationDate_xml – month: 2
  year: 2025
  text: 20250204
  day: 4
PublicationDecade 2020
PublicationTitle IEEE transactions on affective computing
PublicationTitleAbbrev TAFFC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000333627
Score 2.3568082
Snippet S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 1
SubjectTerms Acoustics
Affective computing
Brain modeling
decoupled feature fusion
Depression
Electronic mail
Feature extraction
Long short term memory
Medical diagnostic imaging
Mel frequency cepstral coefficient
multi-perspective
Spectrogram
Speech depression
voiceprint contrastive learning
Title Decoupled Multi-perspective Fusion for Speech Depression Detection
URI https://ieeexplore.ieee.org/document/10872825
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEA7unry4PlZcX-TgTVLTJm2T47prWQT34gp7K3kiKN1F2ou_3iRtdREEb6G0EGYynW8m880AcKNTmQlBNKJSKUSZlEjklCNiY-cfqGVE-Bvdp2W2eKGP63TdkdUDF8YYE4rPTOSX4S5fb1TjU2XOwlnuuZYDMHCRW0vW-k6oYELczzjviTGY362mRTFzIWCSRsTZdeq76ew4n51pKsGZFCOw7LfR1pC8RU0tI_X5q0Pjv_d5CA46WAmn7Tk4AnumOgajfmQD7Cz4BNzPXbjZbN-NhoF6i7Y_ZEtYND51Bh2Mhc9bY9QrnPd1spVb1qFqqxqDVfGwmi1QN0YBqYymSDlAmBmWSE4sia1RiVGMWy1UGlvfm4XlgviuYSrXMRUYGyawiY3DLpZhRskpGFabypwB6HupCe2iWq05dUhTYpLwXHHqQCWn2E7AbS_ects2yyhDkIF5GZRRemWUnTImYOxFt_NmK7XzP55fgH3_eSiZppdgWH805sohglpeh5PwBW3Csso
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1NT9wwEB1RemgvQFsqvutDe6qydWJnYx84AEu0lI9LtxK3yLEnqlQUVrARgt_CX-G_MXYSWCH1iNSbFUWWnOfMvBm_GQN8dWk5NEa4SJbWRlKVZWQyqSNRxeQfZKWE8Se6p2fD8W_58zw9X4D7p1oYRAziMxz4YTjLd5e28aky-sNV5mstOw3lMd7eUIR2vXs0Iji_JUl-ODkYR90lApEdyjSyRIeGqJJSi0rEFdoErdKVMzaNK9-ZRGVG-J5ZNnOxNJyjMhxjJM9dKa6koGnfwFviGWnSVoc9ZXC4EGT9s74Sh-sfk708P6CYM0kHggxJ6tv3zHm7uetbgvfKl-GhX3crWvk7aGblwN69aAn5v36YFVjqaDPba_f5B1jA-iMs91dSsM5CfYL9EYXTzfQCHQulxdH0uZiU5Y1PDTKi6ezXFNH-YaNeB1zTcBZUafUqTF5jIZ9hsb6scQ2Y7xVnHEXtzmlJTLrkItGZ1ZJIs5a8WofvPZrFtG0GUoQgiusiYF947IsO-3VY9UjNvdmCtPGP51_g3XhyelKcHJ0db8J7P1WQh8stWJxdNbhN7GdW7oRNyKB4ZWwfAf8cDgw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Decoupled+Multi-perspective+Fusion+for+Speech+Depression+Detection&rft.jtitle=IEEE+transactions+on+affective+computing&rft.au=Zhao%2C+Minghui&rft.au=Gao%2C+Hongxiang&rft.au=Zhao%2C+Lulu&rft.au=Wang%2C+Zhongyu&rft.date=2025-02-04&rft.issn=1949-3045&rft.eissn=1949-3045&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1109%2FTAFFC.2025.3538519&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TAFFC_2025_3538519
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1949-3045&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1949-3045&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1949-3045&client=summon