DEP-Former: Multimodal Depression Recognition Based on Facial Expressions and Audio Features via Emotional Changes

Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), a...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 35; no. 3; pp. 2087 - 2100
Main Authors	Ye, Jiayu, Yu, Yanhong, Lu, Lin, Wang, Hao, Zheng, Yunshao, Liu, Yang, Wang, Qingxiang
Format	Journal Article
Language	English
Published	IEEE 01.03.2025
Subjects	audio features Circuits and systems Data collection Data models Deep learning DEP-Former Depression depression recognition Emotion recognition EWRE Face recognition facial expressions Feature extraction Indexes Mental health
Online Access	Get full text

Cover

Loading…

Abstract	Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), and extract features from facial expressions and audios for depression recognition. Building upon this, we propose a depression recognition model (DEP-Former), which deeply integrates multimodal features. DEP-Former first designs a modality adapter to achieve emotion space mapping and the sharing of multimodal features, addressing cross-modal inconsistencies. Simultaneously, it proposes a mechanism of attention index sharing, exceeding the limitations of cognitive subjectivity by calculating confidence in key emotional information across modalities. Finally, we propose a multimodal cross-attention module and a Bernoulli distribution feature fusion prediction module to achieve deep integration of multilevel information, thereby enabling depression recognition. Compared with existing advanced multimodal models, DEP-Former demonstrates superior performance in EWRE, achieving an accuracy of 0.9500 and an F1 score of 0.9499, significantly enhancing depression recognition over the single-modality methods. Furthermore, its robust generalization ability is validated on the AVEC 2014 dataset. Through the attention query of the interpretability analysis module, we discover that depressed patients exhibit heightened sensitivity to negative emotional words, such as dismissal and tragedy. In contrast, healthy individuals tend to be more attuned to positive emotional words, including passion, purity, and justice. Additionally, depressed patients exhibit a degree of psychological state diversity, showing sensitivity to some positive emotional words as well. Our codes and data are available at https://github.com/QLUTEmoTechCrew/DEP-Former .
AbstractList	Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), and extract features from facial expressions and audios for depression recognition. Building upon this, we propose a depression recognition model (DEP-Former), which deeply integrates multimodal features. DEP-Former first designs a modality adapter to achieve emotion space mapping and the sharing of multimodal features, addressing cross-modal inconsistencies. Simultaneously, it proposes a mechanism of attention index sharing, exceeding the limitations of cognitive subjectivity by calculating confidence in key emotional information across modalities. Finally, we propose a multimodal cross-attention module and a Bernoulli distribution feature fusion prediction module to achieve deep integration of multilevel information, thereby enabling depression recognition. Compared with existing advanced multimodal models, DEP-Former demonstrates superior performance in EWRE, achieving an accuracy of 0.9500 and an F1 score of 0.9499, significantly enhancing depression recognition over the single-modality methods. Furthermore, its robust generalization ability is validated on the AVEC 2014 dataset. Through the attention query of the interpretability analysis module, we discover that depressed patients exhibit heightened sensitivity to negative emotional words, such as dismissal and tragedy. In contrast, healthy individuals tend to be more attuned to positive emotional words, including passion, purity, and justice. Additionally, depressed patients exhibit a degree of psychological state diversity, showing sensitivity to some positive emotional words as well. Our codes and data are available at https://github.com/QLUTEmoTechCrew/DEP-Former .
Author	Lu, Lin Zheng, Yunshao Liu, Yang Ye, Jiayu Wang, Qingxiang Yu, Yanhong Wang, Hao
Author_xml	– sequence: 1 givenname: Jiayu orcidid: 0000-0003-0368-9651 surname: Ye fullname: Ye, Jiayu organization: School of Computer Science, Guangdong University of Technology, Guangzhou, China – sequence: 2 givenname: Yanhong orcidid: 0000-0002-6547-6320 surname: Yu fullname: Yu, Yanhong organization: College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China – sequence: 3 givenname: Lin orcidid: 0009-0006-6194-8577 surname: Lu fullname: Lu, Lin organization: Shandong Mental Health Center, Shandong University, Jinan, China – sequence: 4 givenname: Hao orcidid: 0009-0003-9861-5993 surname: Wang fullname: Wang, Hao organization: Shandong Mental Health Center, Shandong University, Jinan, China – sequence: 5 givenname: Yunshao orcidid: 0009-0008-0076-2968 surname: Zheng fullname: Zheng, Yunshao organization: Shandong Mental Health Center, Shandong University, Jinan, China – sequence: 6 givenname: Yang orcidid: 0009-0005-0759-2728 surname: Liu fullname: Liu, Yang organization: Endocrinology, The Fifth Clinical College, Guangzhou, China – sequence: 7 givenname: Qingxiang orcidid: 0000-0002-8159-7739 surname: Wang fullname: Wang, Qingxiang email: wangqx@qlu.edu.cn organization: Shandong Mental Health Center, Shandong University, Jinan, China
BookMark	eNpNkMtOwzAQRS1UJNrCDyAW_oEUv4JjdiVNAKkIBIVtZCfjYtTEVZwi-HsSWiRWc3V1zyzOBI0a3wBC55TMKCXqcpW-vK1mjDAx40L1TXKExjSOk4gxEo_6TGIaJYzGJ2gSwgchVCRCjlG7yJ6i3Lc1tNf4YbfpXO0rvcEL2LYQgvMNfobSrxvXDflGB6hwH3Jdun6Wff3NAtZNhee7ynmcg-52fY8_ncZZ7Qe0H6fvullDOEXHVm8CnB3uFL3m2Sq9i5aPt_fpfBmVjKouMpwbqUEwYgUIAVzaq0qVhpeGgYylgEoya5VQJjaWlbpiSppEGsmMIcLyKWL7v2XrQ2jBFtvW1br9LigpBmvFr7VisFYcrPXQxR5yAPAPkIJxRfkPZrNtUQ
CODEN	ITCTEM
Cites_doi	10.1109/TCSVT.2022.3182658 10.1109/taffc.2022.3179478 10.1109/cvpr.2016.596 10.1111/j.0963-7214.2005.00354.x 10.1109/TNSRE.2022.3204757 10.1001/jama.2017.3826 10.1145/1873951.1874246 10.1016/j.bspc.2021.103107 10.1109/JBHI.2023.3260816 10.21437/Interspeech.2009-103 10.1016/j.jad.2021.08.090 10.1145/2661806.2661807 10.1037/h0076367 10.1109/TAFFC.2020.3031345 10.1109/ICIP.2019.8802965 10.1109/tnnls.2022.3163771 10.21437/Interspeech.2011-750 10.1109/TCSVT.2021.3072412 10.1016/0165-1781(94)90032-9 10.1016/j.compbiomed.2023.106589 10.1176/ajp.154.1.4 10.1016/j.eswa.2021.116076 10.1145/3133944.3133950 10.1145/2988257.2988267 10.1109/TIFS.2015.2414392 10.1155/2018/6508319 10.1145/3209978.3210006 10.1109/TCSVT.2021.3074032 10.1016/j.patrec.2022.01.013 10.1109/TKDE.2024.3350071 10.1016/j.jad.2017.08.038 10.1109/TAFFC.2018.2828819 10.1016/j.cpr.2007.10.001 10.1016/j.procs.2022.01.135 10.1109/TCSVT.2017.2719043 10.1609/aaai.v37i1.25077 10.48550/ARXIV.1706.03762 10.1016/j.bspc.2022.103970 10.1109/TCSVT.2020.3024201 10.1136/jnnp.23.1.56 10.1109/fg.2018.00019 10.1109/TMM.2020.3037496 10.1109/TASLP.2022.3192728 10.1609/aaai.v35i12.17325 10.1073/pnas.1322355111 10.1109/CVPR.2017.113 10.1109/TAFFC.2021.3072579 10.1145/3357384.3358132 10.1109/ICCVW60793.2023.00339 10.1002/da.22264
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TCSVT.2024.3491098
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-2205
EndPage	2100
ExternalDocumentID	10_1109_TCSVT_2024_3491098 10742391
Genre	orig-research
GrantInformation_xml	– fundername: Shandong Provincial Natural Science Foundation, China grantid: ZR2021MF079 funderid: 10.13039/501100020196 – fundername: Key Research and Development Program of Shandong Province grantid: 2021SFGC0504 funderid: 10.13039/501100019033 – fundername: Science and Technology Development Plan of Jinan (Clinical Medicine Science and Technology Innovation Plan) grantid: 20225054
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS RXW TAE TN5 VH1 AAYXX CITATION RIG
ID	FETCH-LOGICAL-c219t-b33b7ae420f4e44e37f6d9cb3cb2e7574ed72ff949b5bf2cad297b87b72bb04f3
IEDL.DBID	RIE
ISSN	1051-8215
IngestDate	Tue Jul 01 05:26:31 EDT 2025 Wed Aug 27 01:48:43 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	3
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c219t-b33b7ae420f4e44e37f6d9cb3cb2e7574ed72ff949b5bf2cad297b87b72bb04f3
ORCID	0009-0003-9861-5993 0009-0005-0759-2728 0000-0002-6547-6320 0009-0008-0076-2968 0000-0003-0368-9651 0009-0006-6194-8577 0000-0002-8159-7739
PageCount	14
ParticipantIDs	ieee_primary_10742391 crossref_primary_10_1109_TCSVT_2024_3491098
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2025-03-01
PublicationDateYYYYMMDD	2025-03-01
PublicationDate_xml	– month: 03 year: 2025 text: 2025-03-01 day: 01
PublicationDecade	2020
PublicationTitle	IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev	TCSVT
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref13 ref12 ref15 ref14 ref53 ref52 ref11 ref55 ref10 ref54 ref17 ref16 ref19 ref18 ref51 ref50 ref46 ref45 ref48 ref42 ref41 ref44 ref43 Wang (ref25) 2008; 22 ref49 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Lu (ref29) 2005; 1 ref40 ref35 ref34 ref37 ref36 Bai (ref47) ref30 ref33 ref32 ref2 Bradski (ref31) 2000; 25 ref1 ref39 ref38 Apa (ref26) 2013; 2 ref24 ref23 ref20 ref22 ref21 ref28 ref27
References_xml	– ident: ref14 doi: 10.1109/TCSVT.2022.3182658 – ident: ref51 doi: 10.1109/taffc.2022.3179478 – ident: ref55 doi: 10.1109/cvpr.2016.596 – ident: ref37 doi: 10.1111/j.0963-7214.2005.00354.x – start-page: 17804 volume-title: Proc. NIPS ident: ref47 article-title: Adaptive graph convolutional recurrent network for traffic forecasting – ident: ref13 doi: 10.1109/TNSRE.2022.3204757 – ident: ref1 doi: 10.1001/jama.2017.3826 – ident: ref33 doi: 10.1145/1873951.1874246 – ident: ref21 doi: 10.1016/j.bspc.2021.103107 – ident: ref2 doi: 10.1109/JBHI.2023.3260816 – ident: ref34 doi: 10.21437/Interspeech.2009-103 – ident: ref19 doi: 10.1016/j.jad.2021.08.090 – ident: ref28 doi: 10.1145/2661806.2661807 – ident: ref3 doi: 10.1037/h0076367 – ident: ref18 doi: 10.1109/TAFFC.2020.3031345 – ident: ref44 doi: 10.1109/ICIP.2019.8802965 – ident: ref52 doi: 10.1109/tnnls.2022.3163771 – ident: ref39 doi: 10.21437/Interspeech.2011-750 – ident: ref15 doi: 10.1109/TCSVT.2021.3072412 – ident: ref5 doi: 10.1016/0165-1781(94)90032-9 – ident: ref43 doi: 10.1016/j.compbiomed.2023.106589 – volume: 22 start-page: 608 issue: 8 year: 2008 ident: ref25 article-title: The pilot establishment and evaluation of Chinese affective words system publication-title: Chin. Mental Health J. – ident: ref4 doi: 10.1176/ajp.154.1.4 – ident: ref7 doi: 10.1016/j.eswa.2021.116076 – ident: ref20 doi: 10.1145/3133944.3133950 – ident: ref17 doi: 10.1145/2988257.2988267 – volume: 1 start-page: 1 year: 2005 ident: ref29 article-title: The development of native Chinese affective picture system—A pretest in 46 college students publication-title: Chin. mental health J. – ident: ref12 doi: 10.1109/TIFS.2015.2414392 – ident: ref16 doi: 10.1155/2018/6508319 – ident: ref49 doi: 10.1145/3209978.3210006 – ident: ref11 doi: 10.1109/TCSVT.2021.3074032 – ident: ref41 doi: 10.1016/j.patrec.2022.01.013 – ident: ref24 doi: 10.1109/TKDE.2024.3350071 – ident: ref38 doi: 10.1016/j.jad.2017.08.038 – ident: ref6 doi: 10.1109/TAFFC.2018.2828819 – ident: ref36 doi: 10.1016/j.cpr.2007.10.001 – ident: ref32 doi: 10.1016/j.procs.2022.01.135 – ident: ref9 doi: 10.1109/TCSVT.2017.2719043 – ident: ref40 doi: 10.1609/aaai.v37i1.25077 – ident: ref22 doi: 10.48550/ARXIV.1706.03762 – ident: ref53 doi: 10.1016/j.bspc.2022.103970 – ident: ref10 doi: 10.1109/TCSVT.2020.3024201 – ident: ref27 doi: 10.1136/jnnp.23.1.56 – volume: 2 start-page: 1 year: 2013 ident: ref26 article-title: Diagnostic and statistical manual of mental disorders publication-title: Am. J. Psychiatry. – volume: 25 start-page: 120 issue: 11 year: 2000 ident: ref31 article-title: The OpenCV library publication-title: Dr. Dobb’s J. Softw. Tools J. Softw. Tools Prof. Programmer – ident: ref30 doi: 10.1109/fg.2018.00019 – ident: ref23 doi: 10.1109/TMM.2020.3037496 – ident: ref54 doi: 10.1109/TASLP.2022.3192728 – ident: ref50 doi: 10.1609/aaai.v35i12.17325 – ident: ref35 doi: 10.1073/pnas.1322355111 – ident: ref46 doi: 10.1109/CVPR.2017.113 – ident: ref45 doi: 10.1109/TAFFC.2021.3072579 – ident: ref48 doi: 10.1145/3357384.3358132 – ident: ref42 doi: 10.1109/ICCVW60793.2023.00339 – ident: ref8 doi: 10.1002/da.22264
SSID	ssj0014847
Score	2.4762027
Snippet	Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual...
SourceID	crossref ieee
SourceType	Index Database Publisher
StartPage	2087
SubjectTerms	audio features Circuits and systems Data collection Data models Deep learning DEP-Former Depression depression recognition Emotion recognition EWRE Face recognition facial expressions Feature extraction Indexes Mental health
Title	DEP-Former: Multimodal Depression Recognition Based on Facial Expressions and Audio Features via Emotional Changes
URI	https://ieeexplore.ieee.org/document/10742391
Volume	35
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA-6J33wc-L8Ig--Sbc2TZfGt-k2huAQ3WRvJWkSGGIr2yriX-8lbWUKgm9HuEDIhfzukvvdIXRJ4q4BXCEej0PqUQW2EAGDy5ByElCjIynsj-79uDua0rtZNKvI6o4Lo7V2yWe6bUX3l6_ytLBPZR2bPEhCy1XfhMitJGt9fxnQ2HUTA38h8GIAspoh4_PO5PbpeQKxIKHtkAI-8vgHCq21VXGoMtxF43o9ZTLJS7tYyXb6-atU478XvId2Kv8S98oDsY82dHaAtteqDh6iRX_w4A3BV9WLa-wIuK-5gjn9Oik2w491WhHIN4BzCoMwFPZ5HQ8-arUlFpnCvULNc2xdyQLG8ftc4EHZGwiUS_LCsommw8HkduRVrRe8FK6wlSfDUDKhKfEN1ZTqkJmu4qkMU0k0ixjVihFjOOUykoakQhHOZMwkI1L61IRHqJHlmT5GmCi_GxuIk3jAAQl9yWPNfC6EDJj2A9NCV7UpkreywkbiIhOfJ85wiTVcUhmuhZp2m9c0yx0--WP8FG0R27HXZY2docZqUehzcCNW8sIdny9BSMQs
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA8yH9QHv8Vv8-CbtLZpujS-zdkxv4bolL2VpElgiJ3MVcS_3ku6yhQE345wgZAL-d0l97tD6JgkTQO4QjyeRNSjCmwhQgaXIeUkpEbHUtgf3dtes_tIrwbxYEpWd1wYrbVLPtO-Fd1fvhrlpX0qO7XJgySyXPV5AP44rOha358GNHH9xMBjCL0EoKzmyAT8tN9-eOpDNEioH1FASJ78wKGZxioOVzorqFevqEonefbLifTzz1_FGv-95FW0PPUwcas6EmtoThfraGmm7uAGGl-kd14HvFU9PsOOgvsyUjDnok6LLfB9nVgE8jkgncIgdIR9YMfpR632hkWhcKtUwxG2zmQJ4_h9KHBadQcC5Yq-8LaJHjtpv931ps0XvBwusYkno0gyoSkJDNWU6oiZpuK5jHJJNIsZ1YoRYzjlMpaG5EIRzmTCJCNSBtREW6hRjAq9jTBRQTMxECnxkAMWBpInmgVcCBkyHYRmB53UpsheqxobmYtNAp45w2XWcNnUcDto027zjGa1w7t_jB-hhW7_9ia7uexd76FFYvv3uhyyfdSYjEt9AE7FRB66o_QFqU_HdQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DEP-Former%3A+Multimodal+Depression+Recognition+Based+on+Facial+Expressions+and+Audio+Features+via+Emotional+Changes&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Ye%2C+Jiayu&rft.au=Yu%2C+Yanhong&rft.au=Lu%2C+Lin&rft.au=Wang%2C+Hao&rft.date=2025-03-01&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=35&rft.issue=3&rft.spage=2087&rft.epage=2100&rft_id=info:doi/10.1109%2FTCSVT.2024.3491098&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCSVT_2024_3491098
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon