DEP-Former: Multimodal Depression Recognition Based on Facial Expressions and Audio Features via Emotional Changes
Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), a...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 35; no. 3; pp. 2087 - 2100 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.03.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), and extract features from facial expressions and audios for depression recognition. Building upon this, we propose a depression recognition model (DEP-Former), which deeply integrates multimodal features. DEP-Former first designs a modality adapter to achieve emotion space mapping and the sharing of multimodal features, addressing cross-modal inconsistencies. Simultaneously, it proposes a mechanism of attention index sharing, exceeding the limitations of cognitive subjectivity by calculating confidence in key emotional information across modalities. Finally, we propose a multimodal cross-attention module and a Bernoulli distribution feature fusion prediction module to achieve deep integration of multilevel information, thereby enabling depression recognition. Compared with existing advanced multimodal models, DEP-Former demonstrates superior performance in EWRE, achieving an accuracy of 0.9500 and an F1 score of 0.9499, significantly enhancing depression recognition over the single-modality methods. Furthermore, its robust generalization ability is validated on the AVEC 2014 dataset. Through the attention query of the interpretability analysis module, we discover that depressed patients exhibit heightened sensitivity to negative emotional words, such as dismissal and tragedy. In contrast, healthy individuals tend to be more attuned to positive emotional words, including passion, purity, and justice. Additionally, depressed patients exhibit a degree of psychological state diversity, showing sensitivity to some positive emotional words as well. Our codes and data are available at https://github.com/QLUTEmoTechCrew/DEP-Former . |
---|---|
AbstractList | Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), and extract features from facial expressions and audios for depression recognition. Building upon this, we propose a depression recognition model (DEP-Former), which deeply integrates multimodal features. DEP-Former first designs a modality adapter to achieve emotion space mapping and the sharing of multimodal features, addressing cross-modal inconsistencies. Simultaneously, it proposes a mechanism of attention index sharing, exceeding the limitations of cognitive subjectivity by calculating confidence in key emotional information across modalities. Finally, we propose a multimodal cross-attention module and a Bernoulli distribution feature fusion prediction module to achieve deep integration of multilevel information, thereby enabling depression recognition. Compared with existing advanced multimodal models, DEP-Former demonstrates superior performance in EWRE, achieving an accuracy of 0.9500 and an F1 score of 0.9499, significantly enhancing depression recognition over the single-modality methods. Furthermore, its robust generalization ability is validated on the AVEC 2014 dataset. Through the attention query of the interpretability analysis module, we discover that depressed patients exhibit heightened sensitivity to negative emotional words, such as dismissal and tragedy. In contrast, healthy individuals tend to be more attuned to positive emotional words, including passion, purity, and justice. Additionally, depressed patients exhibit a degree of psychological state diversity, showing sensitivity to some positive emotional words as well. Our codes and data are available at https://github.com/QLUTEmoTechCrew/DEP-Former . |
Author | Lu, Lin Zheng, Yunshao Liu, Yang Ye, Jiayu Wang, Qingxiang Yu, Yanhong Wang, Hao |
Author_xml | – sequence: 1 givenname: Jiayu orcidid: 0000-0003-0368-9651 surname: Ye fullname: Ye, Jiayu organization: School of Computer Science, Guangdong University of Technology, Guangzhou, China – sequence: 2 givenname: Yanhong orcidid: 0000-0002-6547-6320 surname: Yu fullname: Yu, Yanhong organization: College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China – sequence: 3 givenname: Lin orcidid: 0009-0006-6194-8577 surname: Lu fullname: Lu, Lin organization: Shandong Mental Health Center, Shandong University, Jinan, China – sequence: 4 givenname: Hao orcidid: 0009-0003-9861-5993 surname: Wang fullname: Wang, Hao organization: Shandong Mental Health Center, Shandong University, Jinan, China – sequence: 5 givenname: Yunshao orcidid: 0009-0008-0076-2968 surname: Zheng fullname: Zheng, Yunshao organization: Shandong Mental Health Center, Shandong University, Jinan, China – sequence: 6 givenname: Yang orcidid: 0009-0005-0759-2728 surname: Liu fullname: Liu, Yang organization: Endocrinology, The Fifth Clinical College, Guangzhou, China – sequence: 7 givenname: Qingxiang orcidid: 0000-0002-8159-7739 surname: Wang fullname: Wang, Qingxiang email: wangqx@qlu.edu.cn organization: Shandong Mental Health Center, Shandong University, Jinan, China |
BookMark | eNpNkMtOwzAQRS1UJNrCDyAW_oEUv4JjdiVNAKkIBIVtZCfjYtTEVZwi-HsSWiRWc3V1zyzOBI0a3wBC55TMKCXqcpW-vK1mjDAx40L1TXKExjSOk4gxEo_6TGIaJYzGJ2gSwgchVCRCjlG7yJ6i3Lc1tNf4YbfpXO0rvcEL2LYQgvMNfobSrxvXDflGB6hwH3Jdun6Wff3NAtZNhee7ynmcg-52fY8_ncZZ7Qe0H6fvullDOEXHVm8CnB3uFL3m2Sq9i5aPt_fpfBmVjKouMpwbqUEwYgUIAVzaq0qVhpeGgYylgEoya5VQJjaWlbpiSppEGsmMIcLyKWL7v2XrQ2jBFtvW1br9LigpBmvFr7VisFYcrPXQxR5yAPAPkIJxRfkPZrNtUQ |
CODEN | ITCTEM |
Cites_doi | 10.1109/TCSVT.2022.3182658 10.1109/taffc.2022.3179478 10.1109/cvpr.2016.596 10.1111/j.0963-7214.2005.00354.x 10.1109/TNSRE.2022.3204757 10.1001/jama.2017.3826 10.1145/1873951.1874246 10.1016/j.bspc.2021.103107 10.1109/JBHI.2023.3260816 10.21437/Interspeech.2009-103 10.1016/j.jad.2021.08.090 10.1145/2661806.2661807 10.1037/h0076367 10.1109/TAFFC.2020.3031345 10.1109/ICIP.2019.8802965 10.1109/tnnls.2022.3163771 10.21437/Interspeech.2011-750 10.1109/TCSVT.2021.3072412 10.1016/0165-1781(94)90032-9 10.1016/j.compbiomed.2023.106589 10.1176/ajp.154.1.4 10.1016/j.eswa.2021.116076 10.1145/3133944.3133950 10.1145/2988257.2988267 10.1109/TIFS.2015.2414392 10.1155/2018/6508319 10.1145/3209978.3210006 10.1109/TCSVT.2021.3074032 10.1016/j.patrec.2022.01.013 10.1109/TKDE.2024.3350071 10.1016/j.jad.2017.08.038 10.1109/TAFFC.2018.2828819 10.1016/j.cpr.2007.10.001 10.1016/j.procs.2022.01.135 10.1109/TCSVT.2017.2719043 10.1609/aaai.v37i1.25077 10.48550/ARXIV.1706.03762 10.1016/j.bspc.2022.103970 10.1109/TCSVT.2020.3024201 10.1136/jnnp.23.1.56 10.1109/fg.2018.00019 10.1109/TMM.2020.3037496 10.1109/TASLP.2022.3192728 10.1609/aaai.v35i12.17325 10.1073/pnas.1322355111 10.1109/CVPR.2017.113 10.1109/TAFFC.2021.3072579 10.1145/3357384.3358132 10.1109/ICCVW60793.2023.00339 10.1002/da.22264 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION |
DOI | 10.1109/TCSVT.2024.3491098 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1558-2205 |
EndPage | 2100 |
ExternalDocumentID | 10_1109_TCSVT_2024_3491098 10742391 |
Genre | orig-research |
GrantInformation_xml | – fundername: Shandong Provincial Natural Science Foundation, China grantid: ZR2021MF079 funderid: 10.13039/501100020196 – fundername: Key Research and Development Program of Shandong Province grantid: 2021SFGC0504 funderid: 10.13039/501100019033 – fundername: Science and Technology Development Plan of Jinan (Clinical Medicine Science and Technology Innovation Plan) grantid: 20225054 |
GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS RXW TAE TN5 VH1 AAYXX CITATION RIG |
ID | FETCH-LOGICAL-c219t-b33b7ae420f4e44e37f6d9cb3cb2e7574ed72ff949b5bf2cad297b87b72bb04f3 |
IEDL.DBID | RIE |
ISSN | 1051-8215 |
IngestDate | Tue Jul 01 05:26:31 EDT 2025 Wed Aug 27 01:48:43 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c219t-b33b7ae420f4e44e37f6d9cb3cb2e7574ed72ff949b5bf2cad297b87b72bb04f3 |
ORCID | 0009-0003-9861-5993 0009-0005-0759-2728 0000-0002-6547-6320 0009-0008-0076-2968 0000-0003-0368-9651 0009-0006-6194-8577 0000-0002-8159-7739 |
PageCount | 14 |
ParticipantIDs | ieee_primary_10742391 crossref_primary_10_1109_TCSVT_2024_3491098 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2025-03-01 |
PublicationDateYYYYMMDD | 2025-03-01 |
PublicationDate_xml | – month: 03 year: 2025 text: 2025-03-01 day: 01 |
PublicationDecade | 2020 |
PublicationTitle | IEEE transactions on circuits and systems for video technology |
PublicationTitleAbbrev | TCSVT |
PublicationYear | 2025 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | ref13 ref12 ref15 ref14 ref53 ref52 ref11 ref55 ref10 ref54 ref17 ref16 ref19 ref18 ref51 ref50 ref46 ref45 ref48 ref42 ref41 ref44 ref43 Wang (ref25) 2008; 22 ref49 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Lu (ref29) 2005; 1 ref40 ref35 ref34 ref37 ref36 Bai (ref47) ref30 ref33 ref32 ref2 Bradski (ref31) 2000; 25 ref1 ref39 ref38 Apa (ref26) 2013; 2 ref24 ref23 ref20 ref22 ref21 ref28 ref27 |
References_xml | – ident: ref14 doi: 10.1109/TCSVT.2022.3182658 – ident: ref51 doi: 10.1109/taffc.2022.3179478 – ident: ref55 doi: 10.1109/cvpr.2016.596 – ident: ref37 doi: 10.1111/j.0963-7214.2005.00354.x – start-page: 17804 volume-title: Proc. NIPS ident: ref47 article-title: Adaptive graph convolutional recurrent network for traffic forecasting – ident: ref13 doi: 10.1109/TNSRE.2022.3204757 – ident: ref1 doi: 10.1001/jama.2017.3826 – ident: ref33 doi: 10.1145/1873951.1874246 – ident: ref21 doi: 10.1016/j.bspc.2021.103107 – ident: ref2 doi: 10.1109/JBHI.2023.3260816 – ident: ref34 doi: 10.21437/Interspeech.2009-103 – ident: ref19 doi: 10.1016/j.jad.2021.08.090 – ident: ref28 doi: 10.1145/2661806.2661807 – ident: ref3 doi: 10.1037/h0076367 – ident: ref18 doi: 10.1109/TAFFC.2020.3031345 – ident: ref44 doi: 10.1109/ICIP.2019.8802965 – ident: ref52 doi: 10.1109/tnnls.2022.3163771 – ident: ref39 doi: 10.21437/Interspeech.2011-750 – ident: ref15 doi: 10.1109/TCSVT.2021.3072412 – ident: ref5 doi: 10.1016/0165-1781(94)90032-9 – ident: ref43 doi: 10.1016/j.compbiomed.2023.106589 – volume: 22 start-page: 608 issue: 8 year: 2008 ident: ref25 article-title: The pilot establishment and evaluation of Chinese affective words system publication-title: Chin. Mental Health J. – ident: ref4 doi: 10.1176/ajp.154.1.4 – ident: ref7 doi: 10.1016/j.eswa.2021.116076 – ident: ref20 doi: 10.1145/3133944.3133950 – ident: ref17 doi: 10.1145/2988257.2988267 – volume: 1 start-page: 1 year: 2005 ident: ref29 article-title: The development of native Chinese affective picture system—A pretest in 46 college students publication-title: Chin. mental health J. – ident: ref12 doi: 10.1109/TIFS.2015.2414392 – ident: ref16 doi: 10.1155/2018/6508319 – ident: ref49 doi: 10.1145/3209978.3210006 – ident: ref11 doi: 10.1109/TCSVT.2021.3074032 – ident: ref41 doi: 10.1016/j.patrec.2022.01.013 – ident: ref24 doi: 10.1109/TKDE.2024.3350071 – ident: ref38 doi: 10.1016/j.jad.2017.08.038 – ident: ref6 doi: 10.1109/TAFFC.2018.2828819 – ident: ref36 doi: 10.1016/j.cpr.2007.10.001 – ident: ref32 doi: 10.1016/j.procs.2022.01.135 – ident: ref9 doi: 10.1109/TCSVT.2017.2719043 – ident: ref40 doi: 10.1609/aaai.v37i1.25077 – ident: ref22 doi: 10.48550/ARXIV.1706.03762 – ident: ref53 doi: 10.1016/j.bspc.2022.103970 – ident: ref10 doi: 10.1109/TCSVT.2020.3024201 – ident: ref27 doi: 10.1136/jnnp.23.1.56 – volume: 2 start-page: 1 year: 2013 ident: ref26 article-title: Diagnostic and statistical manual of mental disorders publication-title: Am. J. Psychiatry. – volume: 25 start-page: 120 issue: 11 year: 2000 ident: ref31 article-title: The OpenCV library publication-title: Dr. Dobb’s J. Softw. Tools J. Softw. Tools Prof. Programmer – ident: ref30 doi: 10.1109/fg.2018.00019 – ident: ref23 doi: 10.1109/TMM.2020.3037496 – ident: ref54 doi: 10.1109/TASLP.2022.3192728 – ident: ref50 doi: 10.1609/aaai.v35i12.17325 – ident: ref35 doi: 10.1073/pnas.1322355111 – ident: ref46 doi: 10.1109/CVPR.2017.113 – ident: ref45 doi: 10.1109/TAFFC.2021.3072579 – ident: ref48 doi: 10.1145/3357384.3358132 – ident: ref42 doi: 10.1109/ICCVW60793.2023.00339 – ident: ref8 doi: 10.1002/da.22264 |
SSID | ssj0014847 |
Score | 2.4762027 |
Snippet | Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual... |
SourceID | crossref ieee |
SourceType | Index Database Publisher |
StartPage | 2087 |
SubjectTerms | audio features Circuits and systems Data collection Data models Deep learning DEP-Former Depression depression recognition Emotion recognition EWRE Face recognition facial expressions Feature extraction Indexes Mental health |
Title | DEP-Former: Multimodal Depression Recognition Based on Facial Expressions and Audio Features via Emotional Changes |
URI | https://ieeexplore.ieee.org/document/10742391 |
Volume | 35 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA-6J33wc-L8Ig--Sbc2TZfGt-k2huAQ3WRvJWkSGGIr2yriX-8lbWUKgm9HuEDIhfzukvvdIXRJ4q4BXCEej0PqUQW2EAGDy5ByElCjIynsj-79uDua0rtZNKvI6o4Lo7V2yWe6bUX3l6_ytLBPZR2bPEhCy1XfhMitJGt9fxnQ2HUTA38h8GIAspoh4_PO5PbpeQKxIKHtkAI-8vgHCq21VXGoMtxF43o9ZTLJS7tYyXb6-atU478XvId2Kv8S98oDsY82dHaAtteqDh6iRX_w4A3BV9WLa-wIuK-5gjn9Oik2w491WhHIN4BzCoMwFPZ5HQ8-arUlFpnCvULNc2xdyQLG8ftc4EHZGwiUS_LCsommw8HkduRVrRe8FK6wlSfDUDKhKfEN1ZTqkJmu4qkMU0k0ixjVihFjOOUykoakQhHOZMwkI1L61IRHqJHlmT5GmCi_GxuIk3jAAQl9yWPNfC6EDJj2A9NCV7UpkreywkbiIhOfJ85wiTVcUhmuhZp2m9c0yx0--WP8FG0R27HXZY2docZqUehzcCNW8sIdny9BSMQs |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA8yH9QHv8Vv8-CbtLZpujS-zdkxv4bolL2VpElgiJ3MVcS_3ku6yhQE345wgZAL-d0l97tD6JgkTQO4QjyeRNSjCmwhQgaXIeUkpEbHUtgf3dtes_tIrwbxYEpWd1wYrbVLPtO-Fd1fvhrlpX0qO7XJgySyXPV5AP44rOha358GNHH9xMBjCL0EoKzmyAT8tN9-eOpDNEioH1FASJ78wKGZxioOVzorqFevqEonefbLifTzz1_FGv-95FW0PPUwcas6EmtoThfraGmm7uAGGl-kd14HvFU9PsOOgvsyUjDnok6LLfB9nVgE8jkgncIgdIR9YMfpR632hkWhcKtUwxG2zmQJ4_h9KHBadQcC5Yq-8LaJHjtpv931ps0XvBwusYkno0gyoSkJDNWU6oiZpuK5jHJJNIsZ1YoRYzjlMpaG5EIRzmTCJCNSBtREW6hRjAq9jTBRQTMxECnxkAMWBpInmgVcCBkyHYRmB53UpsheqxobmYtNAp45w2XWcNnUcDto027zjGa1w7t_jB-hhW7_9ia7uexd76FFYvv3uhyyfdSYjEt9AE7FRB66o_QFqU_HdQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DEP-Former%3A+Multimodal+Depression+Recognition+Based+on+Facial+Expressions+and+Audio+Features+via+Emotional+Changes&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Ye%2C+Jiayu&rft.au=Yu%2C+Yanhong&rft.au=Lu%2C+Lin&rft.au=Wang%2C+Hao&rft.date=2025-03-01&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=35&rft.issue=3&rft.spage=2087&rft.epage=2100&rft_id=info:doi/10.1109%2FTCSVT.2024.3491098&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCSVT_2024_3491098 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon |