DEP-Former: Multimodal Depression Recognition Based on Facial Expressions and Audio Features via Emotional Changes

Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), a...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 35; no. 3; pp. 2087 - 2100
Main Authors Ye, Jiayu, Yu, Yanhong, Lu, Lin, Wang, Hao, Zheng, Yunshao, Liu, Yang, Wang, Qingxiang
Format Journal Article
LanguageEnglish
Published IEEE 01.03.2025
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), and extract features from facial expressions and audios for depression recognition. Building upon this, we propose a depression recognition model (DEP-Former), which deeply integrates multimodal features. DEP-Former first designs a modality adapter to achieve emotion space mapping and the sharing of multimodal features, addressing cross-modal inconsistencies. Simultaneously, it proposes a mechanism of attention index sharing, exceeding the limitations of cognitive subjectivity by calculating confidence in key emotional information across modalities. Finally, we propose a multimodal cross-attention module and a Bernoulli distribution feature fusion prediction module to achieve deep integration of multilevel information, thereby enabling depression recognition. Compared with existing advanced multimodal models, DEP-Former demonstrates superior performance in EWRE, achieving an accuracy of 0.9500 and an F1 score of 0.9499, significantly enhancing depression recognition over the single-modality methods. Furthermore, its robust generalization ability is validated on the AVEC 2014 dataset. Through the attention query of the interpretability analysis module, we discover that depressed patients exhibit heightened sensitivity to negative emotional words, such as dismissal and tragedy. In contrast, healthy individuals tend to be more attuned to positive emotional words, including passion, purity, and justice. Additionally, depressed patients exhibit a degree of psychological state diversity, showing sensitivity to some positive emotional words as well. Our codes and data are available at https://github.com/QLUTEmoTechCrew/DEP-Former .
AbstractList Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual technology is an effective approach for achieving depression recognition. Hence, in this paper we propose an emotion word reading experiment (EWRE), and extract features from facial expressions and audios for depression recognition. Building upon this, we propose a depression recognition model (DEP-Former), which deeply integrates multimodal features. DEP-Former first designs a modality adapter to achieve emotion space mapping and the sharing of multimodal features, addressing cross-modal inconsistencies. Simultaneously, it proposes a mechanism of attention index sharing, exceeding the limitations of cognitive subjectivity by calculating confidence in key emotional information across modalities. Finally, we propose a multimodal cross-attention module and a Bernoulli distribution feature fusion prediction module to achieve deep integration of multilevel information, thereby enabling depression recognition. Compared with existing advanced multimodal models, DEP-Former demonstrates superior performance in EWRE, achieving an accuracy of 0.9500 and an F1 score of 0.9499, significantly enhancing depression recognition over the single-modality methods. Furthermore, its robust generalization ability is validated on the AVEC 2014 dataset. Through the attention query of the interpretability analysis module, we discover that depressed patients exhibit heightened sensitivity to negative emotional words, such as dismissal and tragedy. In contrast, healthy individuals tend to be more attuned to positive emotional words, including passion, purity, and justice. Additionally, depressed patients exhibit a degree of psychological state diversity, showing sensitivity to some positive emotional words as well. Our codes and data are available at https://github.com/QLUTEmoTechCrew/DEP-Former .
Author Lu, Lin
Zheng, Yunshao
Liu, Yang
Ye, Jiayu
Wang, Qingxiang
Yu, Yanhong
Wang, Hao
Author_xml – sequence: 1
  givenname: Jiayu
  orcidid: 0000-0003-0368-9651
  surname: Ye
  fullname: Ye, Jiayu
  organization: School of Computer Science, Guangdong University of Technology, Guangzhou, China
– sequence: 2
  givenname: Yanhong
  orcidid: 0000-0002-6547-6320
  surname: Yu
  fullname: Yu, Yanhong
  organization: College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
– sequence: 3
  givenname: Lin
  orcidid: 0009-0006-6194-8577
  surname: Lu
  fullname: Lu, Lin
  organization: Shandong Mental Health Center, Shandong University, Jinan, China
– sequence: 4
  givenname: Hao
  orcidid: 0009-0003-9861-5993
  surname: Wang
  fullname: Wang, Hao
  organization: Shandong Mental Health Center, Shandong University, Jinan, China
– sequence: 5
  givenname: Yunshao
  orcidid: 0009-0008-0076-2968
  surname: Zheng
  fullname: Zheng, Yunshao
  organization: Shandong Mental Health Center, Shandong University, Jinan, China
– sequence: 6
  givenname: Yang
  orcidid: 0009-0005-0759-2728
  surname: Liu
  fullname: Liu, Yang
  organization: Endocrinology, The Fifth Clinical College, Guangzhou, China
– sequence: 7
  givenname: Qingxiang
  orcidid: 0000-0002-8159-7739
  surname: Wang
  fullname: Wang, Qingxiang
  email: wangqx@qlu.edu.cn
  organization: Shandong Mental Health Center, Shandong University, Jinan, China
BookMark eNpNkMtOwzAQRS1UJNrCDyAW_oEUv4JjdiVNAKkIBIVtZCfjYtTEVZwi-HsSWiRWc3V1zyzOBI0a3wBC55TMKCXqcpW-vK1mjDAx40L1TXKExjSOk4gxEo_6TGIaJYzGJ2gSwgchVCRCjlG7yJ6i3Lc1tNf4YbfpXO0rvcEL2LYQgvMNfobSrxvXDflGB6hwH3Jdun6Wff3NAtZNhee7ynmcg-52fY8_ncZZ7Qe0H6fvullDOEXHVm8CnB3uFL3m2Sq9i5aPt_fpfBmVjKouMpwbqUEwYgUIAVzaq0qVhpeGgYylgEoya5VQJjaWlbpiSppEGsmMIcLyKWL7v2XrQ2jBFtvW1br9LigpBmvFr7VisFYcrPXQxR5yAPAPkIJxRfkPZrNtUQ
CODEN ITCTEM
Cites_doi 10.1109/TCSVT.2022.3182658
10.1109/taffc.2022.3179478
10.1109/cvpr.2016.596
10.1111/j.0963-7214.2005.00354.x
10.1109/TNSRE.2022.3204757
10.1001/jama.2017.3826
10.1145/1873951.1874246
10.1016/j.bspc.2021.103107
10.1109/JBHI.2023.3260816
10.21437/Interspeech.2009-103
10.1016/j.jad.2021.08.090
10.1145/2661806.2661807
10.1037/h0076367
10.1109/TAFFC.2020.3031345
10.1109/ICIP.2019.8802965
10.1109/tnnls.2022.3163771
10.21437/Interspeech.2011-750
10.1109/TCSVT.2021.3072412
10.1016/0165-1781(94)90032-9
10.1016/j.compbiomed.2023.106589
10.1176/ajp.154.1.4
10.1016/j.eswa.2021.116076
10.1145/3133944.3133950
10.1145/2988257.2988267
10.1109/TIFS.2015.2414392
10.1155/2018/6508319
10.1145/3209978.3210006
10.1109/TCSVT.2021.3074032
10.1016/j.patrec.2022.01.013
10.1109/TKDE.2024.3350071
10.1016/j.jad.2017.08.038
10.1109/TAFFC.2018.2828819
10.1016/j.cpr.2007.10.001
10.1016/j.procs.2022.01.135
10.1109/TCSVT.2017.2719043
10.1609/aaai.v37i1.25077
10.48550/ARXIV.1706.03762
10.1016/j.bspc.2022.103970
10.1109/TCSVT.2020.3024201
10.1136/jnnp.23.1.56
10.1109/fg.2018.00019
10.1109/TMM.2020.3037496
10.1109/TASLP.2022.3192728
10.1609/aaai.v35i12.17325
10.1073/pnas.1322355111
10.1109/CVPR.2017.113
10.1109/TAFFC.2021.3072579
10.1145/3357384.3358132
10.1109/ICCVW60793.2023.00339
10.1002/da.22264
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TCSVT.2024.3491098
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2205
EndPage 2100
ExternalDocumentID 10_1109_TCSVT_2024_3491098
10742391
Genre orig-research
GrantInformation_xml – fundername: Shandong Provincial Natural Science Foundation, China
  grantid: ZR2021MF079
  funderid: 10.13039/501100020196
– fundername: Key Research and Development Program of Shandong Province
  grantid: 2021SFGC0504
  funderid: 10.13039/501100019033
– fundername: Science and Technology Development Plan of Jinan (Clinical Medicine Science and Technology Innovation Plan)
  grantid: 20225054
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
VH1
AAYXX
CITATION
RIG
ID FETCH-LOGICAL-c219t-b33b7ae420f4e44e37f6d9cb3cb2e7574ed72ff949b5bf2cad297b87b72bb04f3
IEDL.DBID RIE
ISSN 1051-8215
IngestDate Tue Jul 01 05:26:31 EDT 2025
Wed Aug 27 01:48:43 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c219t-b33b7ae420f4e44e37f6d9cb3cb2e7574ed72ff949b5bf2cad297b87b72bb04f3
ORCID 0009-0003-9861-5993
0009-0005-0759-2728
0000-0002-6547-6320
0009-0008-0076-2968
0000-0003-0368-9651
0009-0006-6194-8577
0000-0002-8159-7739
PageCount 14
ParticipantIDs ieee_primary_10742391
crossref_primary_10_1109_TCSVT_2024_3491098
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-03-01
PublicationDateYYYYMMDD 2025-03-01
PublicationDate_xml – month: 03
  year: 2025
  text: 2025-03-01
  day: 01
PublicationDecade 2020
PublicationTitle IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev TCSVT
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
ref12
ref15
ref14
ref53
ref52
ref11
ref55
ref10
ref54
ref17
ref16
ref19
ref18
ref51
ref50
ref46
ref45
ref48
ref42
ref41
ref44
ref43
Wang (ref25) 2008; 22
ref49
ref8
ref7
ref9
ref4
ref3
ref6
ref5
Lu (ref29) 2005; 1
ref40
ref35
ref34
ref37
ref36
Bai (ref47)
ref30
ref33
ref32
ref2
Bradski (ref31) 2000; 25
ref1
ref39
ref38
Apa (ref26) 2013; 2
ref24
ref23
ref20
ref22
ref21
ref28
ref27
References_xml – ident: ref14
  doi: 10.1109/TCSVT.2022.3182658
– ident: ref51
  doi: 10.1109/taffc.2022.3179478
– ident: ref55
  doi: 10.1109/cvpr.2016.596
– ident: ref37
  doi: 10.1111/j.0963-7214.2005.00354.x
– start-page: 17804
  volume-title: Proc. NIPS
  ident: ref47
  article-title: Adaptive graph convolutional recurrent network for traffic forecasting
– ident: ref13
  doi: 10.1109/TNSRE.2022.3204757
– ident: ref1
  doi: 10.1001/jama.2017.3826
– ident: ref33
  doi: 10.1145/1873951.1874246
– ident: ref21
  doi: 10.1016/j.bspc.2021.103107
– ident: ref2
  doi: 10.1109/JBHI.2023.3260816
– ident: ref34
  doi: 10.21437/Interspeech.2009-103
– ident: ref19
  doi: 10.1016/j.jad.2021.08.090
– ident: ref28
  doi: 10.1145/2661806.2661807
– ident: ref3
  doi: 10.1037/h0076367
– ident: ref18
  doi: 10.1109/TAFFC.2020.3031345
– ident: ref44
  doi: 10.1109/ICIP.2019.8802965
– ident: ref52
  doi: 10.1109/tnnls.2022.3163771
– ident: ref39
  doi: 10.21437/Interspeech.2011-750
– ident: ref15
  doi: 10.1109/TCSVT.2021.3072412
– ident: ref5
  doi: 10.1016/0165-1781(94)90032-9
– ident: ref43
  doi: 10.1016/j.compbiomed.2023.106589
– volume: 22
  start-page: 608
  issue: 8
  year: 2008
  ident: ref25
  article-title: The pilot establishment and evaluation of Chinese affective words system
  publication-title: Chin. Mental Health J.
– ident: ref4
  doi: 10.1176/ajp.154.1.4
– ident: ref7
  doi: 10.1016/j.eswa.2021.116076
– ident: ref20
  doi: 10.1145/3133944.3133950
– ident: ref17
  doi: 10.1145/2988257.2988267
– volume: 1
  start-page: 1
  year: 2005
  ident: ref29
  article-title: The development of native Chinese affective picture system—A pretest in 46 college students
  publication-title: Chin. mental health J.
– ident: ref12
  doi: 10.1109/TIFS.2015.2414392
– ident: ref16
  doi: 10.1155/2018/6508319
– ident: ref49
  doi: 10.1145/3209978.3210006
– ident: ref11
  doi: 10.1109/TCSVT.2021.3074032
– ident: ref41
  doi: 10.1016/j.patrec.2022.01.013
– ident: ref24
  doi: 10.1109/TKDE.2024.3350071
– ident: ref38
  doi: 10.1016/j.jad.2017.08.038
– ident: ref6
  doi: 10.1109/TAFFC.2018.2828819
– ident: ref36
  doi: 10.1016/j.cpr.2007.10.001
– ident: ref32
  doi: 10.1016/j.procs.2022.01.135
– ident: ref9
  doi: 10.1109/TCSVT.2017.2719043
– ident: ref40
  doi: 10.1609/aaai.v37i1.25077
– ident: ref22
  doi: 10.48550/ARXIV.1706.03762
– ident: ref53
  doi: 10.1016/j.bspc.2022.103970
– ident: ref10
  doi: 10.1109/TCSVT.2020.3024201
– ident: ref27
  doi: 10.1136/jnnp.23.1.56
– volume: 2
  start-page: 1
  year: 2013
  ident: ref26
  article-title: Diagnostic and statistical manual of mental disorders
  publication-title: Am. J. Psychiatry.
– volume: 25
  start-page: 120
  issue: 11
  year: 2000
  ident: ref31
  article-title: The OpenCV library
  publication-title: Dr. Dobb’s J. Softw. Tools J. Softw. Tools Prof. Programmer
– ident: ref30
  doi: 10.1109/fg.2018.00019
– ident: ref23
  doi: 10.1109/TMM.2020.3037496
– ident: ref54
  doi: 10.1109/TASLP.2022.3192728
– ident: ref50
  doi: 10.1609/aaai.v35i12.17325
– ident: ref35
  doi: 10.1073/pnas.1322355111
– ident: ref46
  doi: 10.1109/CVPR.2017.113
– ident: ref45
  doi: 10.1109/TAFFC.2021.3072579
– ident: ref48
  doi: 10.1145/3357384.3358132
– ident: ref42
  doi: 10.1109/ICCVW60793.2023.00339
– ident: ref8
  doi: 10.1002/da.22264
SSID ssj0014847
Score 2.4762027
Snippet Clinical research has demonstrated that exploring behavioral signal differences between depressed patients and non-depressed people using audiovisual...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 2087
SubjectTerms audio features
Circuits and systems
Data collection
Data models
Deep learning
DEP-Former
Depression
depression recognition
Emotion recognition
EWRE
Face recognition
facial expressions
Feature extraction
Indexes
Mental health
Title DEP-Former: Multimodal Depression Recognition Based on Facial Expressions and Audio Features via Emotional Changes
URI https://ieeexplore.ieee.org/document/10742391
Volume 35
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA-6J33wc-L8Ig--Sbc2TZfGt-k2huAQ3WRvJWkSGGIr2yriX-8lbWUKgm9HuEDIhfzukvvdIXRJ4q4BXCEej0PqUQW2EAGDy5ByElCjIynsj-79uDua0rtZNKvI6o4Lo7V2yWe6bUX3l6_ytLBPZR2bPEhCy1XfhMitJGt9fxnQ2HUTA38h8GIAspoh4_PO5PbpeQKxIKHtkAI-8vgHCq21VXGoMtxF43o9ZTLJS7tYyXb6-atU478XvId2Kv8S98oDsY82dHaAtteqDh6iRX_w4A3BV9WLa-wIuK-5gjn9Oik2w491WhHIN4BzCoMwFPZ5HQ8-arUlFpnCvULNc2xdyQLG8ftc4EHZGwiUS_LCsommw8HkduRVrRe8FK6wlSfDUDKhKfEN1ZTqkJmu4qkMU0k0ixjVihFjOOUykoakQhHOZMwkI1L61IRHqJHlmT5GmCi_GxuIk3jAAQl9yWPNfC6EDJj2A9NCV7UpkreywkbiIhOfJ85wiTVcUhmuhZp2m9c0yx0--WP8FG0R27HXZY2docZqUehzcCNW8sIdny9BSMQs
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA8yH9QHv8Vv8-CbtLZpujS-zdkxv4bolL2VpElgiJ3MVcS_3ku6yhQE345wgZAL-d0l97tD6JgkTQO4QjyeRNSjCmwhQgaXIeUkpEbHUtgf3dtes_tIrwbxYEpWd1wYrbVLPtO-Fd1fvhrlpX0qO7XJgySyXPV5AP44rOha358GNHH9xMBjCL0EoKzmyAT8tN9-eOpDNEioH1FASJ78wKGZxioOVzorqFevqEonefbLifTzz1_FGv-95FW0PPUwcas6EmtoThfraGmm7uAGGl-kd14HvFU9PsOOgvsyUjDnok6LLfB9nVgE8jkgncIgdIR9YMfpR632hkWhcKtUwxG2zmQJ4_h9KHBadQcC5Yq-8LaJHjtpv931ps0XvBwusYkno0gyoSkJDNWU6oiZpuK5jHJJNIsZ1YoRYzjlMpaG5EIRzmTCJCNSBtREW6hRjAq9jTBRQTMxECnxkAMWBpInmgVcCBkyHYRmB53UpsheqxobmYtNAp45w2XWcNnUcDto027zjGa1w7t_jB-hhW7_9ia7uexd76FFYvv3uhyyfdSYjEt9AE7FRB66o_QFqU_HdQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DEP-Former%3A+Multimodal+Depression+Recognition+Based+on+Facial+Expressions+and+Audio+Features+via+Emotional+Changes&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Ye%2C+Jiayu&rft.au=Yu%2C+Yanhong&rft.au=Lu%2C+Lin&rft.au=Wang%2C+Hao&rft.date=2025-03-01&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=35&rft.issue=3&rft.spage=2087&rft.epage=2100&rft_id=info:doi/10.1109%2FTCSVT.2024.3491098&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCSVT_2024_3491098
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon