Decoupled Multi-perspective Fusion for Speech Depression Detection
S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for extracting interpretable acoustic features based on clinical manifestations. In addition, effectively fusing these features to overcome indiv...
Saved in:
Published in | IEEE transactions on affective computing pp. 1 - 15 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
04.02.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for extracting interpretable acoustic features based on clinical manifestations. In addition, effectively fusing these features to overcome individual heterogeneity remains a challenge. This study proposes a decoupled multi-perspective fusion (DMPF) model. The model extracts five key features of voiceprint, emotion, pause, energy, and tremor based on the multi-perspective clinical manifestations. These features are then decoupled into common and private features, which fused through graph attention network to obtain the comprehensive depression representation. Notably, this study has collected a depression speech dataset, which includes standardized and comprehensive tasks along with diagnostic labels provided by psychologists. Extensive subject-independent experiments were conducted on the DAIC-WOZ, MODMA and MPSC datasets. The voiceprint features can automatically cluster the depressed and non-depressed populations. Furthermore, DMPF can effectively fuse common and private features from different perspectives, achieving AUC of 84.20%, 85.34%, 86.13% on three datasets. The results illustrate the interpretability of multi-perspective features and demonstrate that the combination of speech manifestations can enhance the detection ability, which can provide a multi-perspective observational tool for physicians and clinical practice. Code is available at https://github.com/zmh56/SDD-for-DMPF-MPSC . |
---|---|
AbstractList | S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for extracting interpretable acoustic features based on clinical manifestations. In addition, effectively fusing these features to overcome individual heterogeneity remains a challenge. This study proposes a decoupled multi-perspective fusion (DMPF) model. The model extracts five key features of voiceprint, emotion, pause, energy, and tremor based on the multi-perspective clinical manifestations. These features are then decoupled into common and private features, which fused through graph attention network to obtain the comprehensive depression representation. Notably, this study has collected a depression speech dataset, which includes standardized and comprehensive tasks along with diagnostic labels provided by psychologists. Extensive subject-independent experiments were conducted on the DAIC-WOZ, MODMA and MPSC datasets. The voiceprint features can automatically cluster the depressed and non-depressed populations. Furthermore, DMPF can effectively fuse common and private features from different perspectives, achieving AUC of 84.20%, 85.34%, 86.13% on three datasets. The results illustrate the interpretability of multi-perspective features and demonstrate that the combination of speech manifestations can enhance the detection ability, which can provide a multi-perspective observational tool for physicians and clinical practice. Code is available at https://github.com/zmh56/SDD-for-DMPF-MPSC . |
Author | Zhao, Lulu Wang, Fei Wang, Zhongyu Zhao, Minghui Li, Jianqing Liu, Chengyu Gao, Hongxiang Zheng, Wenming |
Author_xml | – sequence: 1 givenname: Minghui surname: Zhao fullname: Zhao, Minghui organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China – sequence: 2 givenname: Hongxiang orcidid: 0000-0003-4121-0250 surname: Gao fullname: Gao, Hongxiang organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China – sequence: 3 givenname: Lulu orcidid: 0000-0001-5183-8741 surname: Zhao fullname: Zhao, Lulu organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China – sequence: 4 givenname: Zhongyu surname: Wang fullname: Wang, Zhongyu organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China – sequence: 5 givenname: Fei surname: Wang fullname: Wang, Fei organization: Early Intervention Unit, Department of Psychiatry, Affiliated Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China – sequence: 6 givenname: Wenming orcidid: 0000-0002-7764-5179 surname: Zheng fullname: Zheng, Wenming organization: Key Laboratory of Child Development and Learning Science (Ministry of Education), School of Biological Science and Medical Engineering, Southeast University, Nanjing, China – sequence: 7 givenname: Jianqing orcidid: 0000-0002-3524-8933 surname: Li fullname: Li, Jianqing organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China – sequence: 8 givenname: Chengyu orcidid: 0000-0003-1965-3020 surname: Liu fullname: Liu, Chengyu organization: State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, Nanjing, China |
BookMark | eNpNkMtOwzAQRS1UJErpDyAW-YGEsSd27GVpSEEqYkH2UXDGIqgkkZ0g8ff0tehs7mg05y7OLZt1fUeM3XNIOAfzWK6KYp0IEDJBiVpyc8Xm3KQmRkjl7GK_YcsQvmE_iKhENmdPOdl-GnbURG_TbmzjgXwYyI7tL0XFFNq-i1zvo4-ByH5FOQ2ewvGa03h467s7du3qXaDlOResLJ7L9Uu8fd-8rlfb2KpUxhYEKNLi06BD7sgKstq4praSO2MU6KxGhQps1vC0BiBdA3EChU6DTnHBxKnW-j4ET64afPtT-7-KQ3XwUB09VAcP1dnDHno4QS0RXQA6E1pI_Afo_lst |
CODEN | ITACBQ |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TAFFC.2025.3538519 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1949-3045 |
EndPage | 15 |
ExternalDocumentID | 10_1109_TAFFC_2025_3538519 10872825 |
Genre | orig-research |
GroupedDBID | 0R~ 4.4 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS HZ~ IEDLZ IFIPE IPLJI JAVBF M43 O9- OCL PQQKQ RIA RIE 5VS AAYXX AGSQL CITATION EJD RIG RNI RZB |
ID | FETCH-LOGICAL-c645-c0206e82b93f31fec2ec89fdac51f996087a36360c7d14a00e8a0e1e063f80843 |
IEDL.DBID | RIE |
ISSN | 1949-3045 |
IngestDate | Tue Jul 01 02:57:56 EDT 2025 Wed Aug 27 01:53:39 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c645-c0206e82b93f31fec2ec89fdac51f996087a36360c7d14a00e8a0e1e063f80843 |
ORCID | 0000-0002-3524-8933 0000-0001-5183-8741 0000-0003-4121-0250 0000-0003-1965-3020 0000-0002-7764-5179 |
PageCount | 15 |
ParticipantIDs | crossref_primary_10_1109_TAFFC_2025_3538519 ieee_primary_10872825 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20250204 |
PublicationDateYYYYMMDD | 2025-02-04 |
PublicationDate_xml | – month: 2 year: 2025 text: 20250204 day: 4 |
PublicationDecade | 2020 |
PublicationTitle | IEEE transactions on affective computing |
PublicationTitleAbbrev | TAFFC |
PublicationYear | 2025 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000333627 |
Score | 2.3568082 |
Snippet | S peech D epression D etection (SDD) has garnered attention from researchers due to its low cost and convenience. However, current algorithms lack methods for... |
SourceID | crossref ieee |
SourceType | Index Database Publisher |
StartPage | 1 |
SubjectTerms | Acoustics Affective computing Brain modeling decoupled feature fusion Depression Electronic mail Feature extraction Long short term memory Medical diagnostic imaging Mel frequency cepstral coefficient multi-perspective Spectrogram Speech depression voiceprint contrastive learning |
Title | Decoupled Multi-perspective Fusion for Speech Depression Detection |
URI | https://ieeexplore.ieee.org/document/10872825 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEA7unry4PlZcX-TgTVLTJm2T47prWQT34gp7K3kiKN1F2ou_3iRtdREEb6G0EGYynW8m880AcKNTmQlBNKJSKUSZlEjklCNiY-cfqGVE-Bvdp2W2eKGP63TdkdUDF8YYE4rPTOSX4S5fb1TjU2XOwlnuuZYDMHCRW0vW-k6oYELczzjviTGY362mRTFzIWCSRsTZdeq76ew4n51pKsGZFCOw7LfR1pC8RU0tI_X5q0Pjv_d5CA46WAmn7Tk4AnumOgajfmQD7Cz4BNzPXbjZbN-NhoF6i7Y_ZEtYND51Bh2Mhc9bY9QrnPd1spVb1qFqqxqDVfGwmi1QN0YBqYymSDlAmBmWSE4sia1RiVGMWy1UGlvfm4XlgviuYSrXMRUYGyawiY3DLpZhRskpGFabypwB6HupCe2iWq05dUhTYpLwXHHqQCWn2E7AbS_ects2yyhDkIF5GZRRemWUnTImYOxFt_NmK7XzP55fgH3_eSiZppdgWH805sohglpeh5PwBW3Csso |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1NT9wwEB1RemgvQFsqvutDe6qydWJnYx84AEu0lI9LtxK3yLEnqlQUVrARgt_CX-G_MXYSWCH1iNSbFUWWnOfMvBm_GQN8dWk5NEa4SJbWRlKVZWQyqSNRxeQfZKWE8Se6p2fD8W_58zw9X4D7p1oYRAziMxz4YTjLd5e28aky-sNV5mstOw3lMd7eUIR2vXs0Iji_JUl-ODkYR90lApEdyjSyRIeGqJJSi0rEFdoErdKVMzaNK9-ZRGVG-J5ZNnOxNJyjMhxjJM9dKa6koGnfwFviGWnSVoc9ZXC4EGT9s74Sh-sfk708P6CYM0kHggxJ6tv3zHm7uetbgvfKl-GhX3crWvk7aGblwN69aAn5v36YFVjqaDPba_f5B1jA-iMs91dSsM5CfYL9EYXTzfQCHQulxdH0uZiU5Y1PDTKi6ezXFNH-YaNeB1zTcBZUafUqTF5jIZ9hsb6scQ2Y7xVnHEXtzmlJTLrkItGZ1ZJIs5a8WofvPZrFtG0GUoQgiusiYF947IsO-3VY9UjNvdmCtPGP51_g3XhyelKcHJ0db8J7P1WQh8stWJxdNbhN7GdW7oRNyKB4ZWwfAf8cDgw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Decoupled+Multi-perspective+Fusion+for+Speech+Depression+Detection&rft.jtitle=IEEE+transactions+on+affective+computing&rft.au=Zhao%2C+Minghui&rft.au=Gao%2C+Hongxiang&rft.au=Zhao%2C+Lulu&rft.au=Wang%2C+Zhongyu&rft.date=2025-02-04&rft.issn=1949-3045&rft.eissn=1949-3045&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1109%2FTAFFC.2025.3538519&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TAFFC_2025_3538519 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1949-3045&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1949-3045&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1949-3045&client=summon |